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ABSTRACT 


The recent trend towards higher levels of automation in complex systems, 
such as in nuclear power plants, air-traffic control and flight management, is 
changing the role of the human operator from one of a controller to one of a 
supervisory decision-maker. The operator's primary responsibility in this new 
role is to extract information from his environment, and to integrate it for 
action selection and its implementation. The present analytic and experimental 
research has sought to understand human monitoring, information-processing and 
task selection procedures in dynamic multi-task environments, as a preliminary 
step towards analyzing and evaluating the human component of a supervisory 
control system. 

A simple yet realistic computer representation of the supervisory decision 
situation is developed. The experimental paradigm retains the essence of the 
multi-task decision problem by presenting the human with a dynamic situation 
wherein tasks of different value, time requirement and deadline compete for his 
attention. Via this framework, the effects of various task related variables 
on the human decision-processes are studied. 

A normative dynamic decision model (DDM) of human task sequencing perfor¬ 
mance is developed. The analytic framework of the DDM is based on modem 
control, estimation and semi-Markov decision process theories, which provide 
a general methodology for analyzing dynamic decision-making under uncertainty. 
Two novel features of DDM are its explicit incorporation of human limitations, 
such as reaction time delays, randomness, limited resolving power and limited 
information-processing capacity, and its suitability to assimilate new elements 
of the decision task as they become considered and understood. Also, the 
analytic framework of the DDM has been shown to subsume several problems in 
single-processor sequencing theory, Markov decision theory and priority queueing 
systems. 

In order to validate the model, several time-history and scalar measures 
of performance are proposed. Excellent model-data agreement is obtained for 
all the experimental conditions studied. Moreover, the model has been shown to 
represent human decision behavior significantly better than several heuristic 
sequencing rules of scheduling theory. The model has the potential for use in 
computer—aiding, and could form a significant step towards the modeling of multi¬ 
human behavior in complex, multi-level, multi-task systems. 
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I. INTRODUCTION AND PROBLEM FORMULATION 


An emerging trend in man-machine systems appears to be away from 
manual control to partial, if not full, automation. In this regard, the 
role of the human operator is shifting from one of a direct system con¬ 
troller to that of a monitor of multiple tasks, or a supervisor of sev¬ 
eral semi-automated subsystems. The operator's primary task in these 
systems is to extract information from his environment, and integrate 
this information for action selection and implementation. In this con¬ 
text, monitoring, information-processing and dynamic (real-time) deci¬ 
sion-making skills of the human operator gain prominence over his sensor/ 
motor skills. In order to properly analyze and evaluate the human com¬ 
ponent of a supervisory control system, an understanding of the human 
limitations and capabilities as an information-processor and dynamic 
decision-maker is essential. 

There are two feasible paths that one can follow to develop human 
operator decision models, supported by concomitant experimental results, 
in complex supervisory control systems. The first approach starts with 
one task (or subsystem) and several humans to explore information¬ 
sharing and inter-human dynamics, and then adds more tasks (or subsys¬ 
tems) . The second approach begins by studying single human dynamic 
decision-making among multiple tasks, and next introduces multiple 
decis ion-makers, composed of human and, possibly, non-human decision¬ 
makers. The latter route is advocated in this effort. 

The present research seeks to understand human information-process¬ 
ing and task selection procedures in dynamic multi-task environments. 













2 


The approach is to assimilate the results of a joint experimental and 
analytic program into a normative dynamic decision model (DDM) of human 
task sequencing performance. To this end, a general multi-task decision 
problem is considered wherein tasks of different value, duration and 
deadline compete for the operator's attention. This situation occurs 
in targeting selection, air-traffic control, multiple remotely piloted 
vehicle (m-RPV) control, process control, power system regulation, pro¬ 
duction scheduling, as well as in many other supervisory control sys¬ 
tems. The model that has emerged may be viewed as a basic building block 
in the comprehensive understanding of decision-making procedures, an 
understanding that could facilitate the modeling of multi-human behavior 
in complex, multi-level, multi-task systems. 

1.1 Multi-task Decision Problem 

We believe that a complete theory of human behavior in multi-task 
systems, analogous to Edwards' classification of human response theory 
[1], should consist of three parts: (i) a theory of how potential tasks 
are identified for consideration; (ii) a theory of the process of con¬ 
sideration by which all tasks but one are eliminated; and (iii) a theory 
about how the chosen task is executed. The last topic involves the 
study of human implementation skills, which are of secondary importance 
in supervisory control situations. The first topic, that of identifying 
potential tasks for consideration, is the problem of creative thinking, 
of which little of significance is known at present. However, this may 
not be restrictive in most multi-task systems of the type discussed 
above. In these systems, the tasks are immediately identified, e.g., 
once a target is detected. The topic of selecting a task for action 
from amongst many candidate tasks involves monitoring, information- 
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processing and dynamic (real-time) decision-making, and is the problem 
of interest here. 

Fig. 1 shows the fundamental decision-loop that is addressed in 
this work. The human decision-process involves 1) whether to process a 
task or gather more information (i.e., monitor); and 2) which of N tasks 
(N is time-varying) to act upon in order to maximize the system perfor¬ 
mance (e.g., maximize reward, minimize regret, etc.). The decision-loop 
is dynamic in nature. As time evolves, tasks of different value, dura¬ 
tion (processing time) and opportunity window (deadline) demand human's 
attention, while others depart. The opportunity windows shrink with 
time as the tasks approach their deadlines. 

In the following, we provide a taxonomy for behavioral decision 
theory and show that the multi-task decision problem (MTDP) belongs to 
the most general class of decision-processes studied to date, viz., the 
semi-Markov decision processes (SMDP). We also summarize the results of 
a major literature survey on behavioral decision theory [2], and criti¬ 
cally evaluate the previous (albiet limited) research on multi-task deci¬ 
sion-making, in order to put the nature of the present work in perspec¬ 
tive. 

1.2 A Taxomony for Behavioral Decision Theory 

A decision-maker's (DM's) choice in any decision task is a conse¬ 
quence of what he can do, what he knows and what he wants f3]. "What he 

can do" represents the alternatives (possible responses) available to the 
DM. "What he knows" refers to the information that DM has of the deci¬ 
sion situation. This can range from the deterministic situations where 
all the relevant variables of the decision process are known, to the 
highly probabilistic situations where little information is available 









1: 


DYNAMIC MONITORING/DECISION LOOP FOR A SINGLE OPERATOR IN A 
MULTI-TASK ENVIRONMENT 
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about any variable of interest. Finally, "what he wants" pertains to 
the DM's perception of the task objectives and his preferences for the 
various outcomes of a decision. These three concepts are fundamental to 
every decision-making process. 

Most theories of individual choice behavior can be conveniently 
dichotomized into two distinct classes depending on the nature of the 
decision task, viz., single-stage and multi-stage decision theories. A 
detailed classification of individual choice theories is shown in 
Fig. 2 and is clarified below. 



Legend : 


S = set of states of the system 
H = set of events 
P ■ set of possible actions 
T = transformation rule 
r = reward function 


FIG. 2: A CLASSIFICATION OF INDIVIDUAL CHOICE THEORIES 


1.2.1 Single-Stage Decision Theory 

A single-stage or static decision process may be represented as 
in Fig. 3. 
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FIG. 3: FLOW DIAGRAM OF A SINGLE-STAGE DECISION PROCESS 


We see that a static decision process can be conveniently characterized 
by the triple (E,V,r) where 

E - {e} = a finite non-empty set of external events (also known as 
states of nature, stimuli, hypotheses or diagnoses) 

V = {d} = a finite, non-empty set of possible decisions representing 
"What he can do" (also commonly referred to as alterna¬ 
tives, responses, or actions). 

r =r(e,d)=a reward (return) uniquely associated with the combined 

occurance of event, e, and decision, d. 

Single-stage decision-making problems can be further classified into two 

categories depending on the information that the DM possesses (i.e., 

"What he knows") about E. These are decisions with certainty (riskless) 

and decisions with uncertainty (under risk). In the former category, 

each decision guarantees a reward with certainty, i.e., E is completely 

known. In the latter category, only a probability can be assigned to 

each eeE such that p(e) * 1. 

eeE 

The mechanics of a static decision problem are as follows: the DM 
chooses and executes a decision, d; an event, e, occurs; he receives a 
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reward, r(e,d), determined by the joint occurancs of the event, e, and 
decision, d; and his decisions are mutually independent, i.e., he never 
makes another decision based on whatever he may have learned. It Is 
frequently assumed that the DM chooses his decision to maximize the 
expected reward to minimize regret (i.e., "what he wants"). The widely 

i 

studied single-choice gambling paradigms are examples of single-stage 
decision tasks. 

1.2.2 Multi-Stage Decision Theory 

In single-stage decision-making, the DM must make a single choice 
from among a number of alternatives. But in most man-machine and organ¬ 
izational systems, the DM seldom makes a single isolated decision. These 
situations require that the DM evaluate a number of objects or hypotheses 
simultaneously as the evidence accumulates sequentially and/or that he 
make several interdependent decisions. Thus, an understanding of human 
behavior in multi-stage decision-processes is fundamental to modeling 
human behavior in dynamic and uncertain environments. 

In a multi-stage decision process, the DM makes a sequence of deci¬ 
sions. These types of processes consist of a series of stages such that 
the output of one stage becomes the input to the succeeding stage. Fig. 

A is representative of a multi-stage decision process [A]. 



FIG. A: FLOW DIAGRAM OF AN N-STAGE DECISION PROCESS 
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Referring to Fig. 4, a multi-stage decision process can be characterized 
by the pentad (S,V,E,T, r) where 

5 ■ {a,} ■ set of states of the system 
V * {_d^} ■ set of possible decisions 
E ■ (e^ ■ set of events 

T ■ {t^} - set of transformation rules (laws of motion or transition 
functions) that describe the changes in state at each 
stage 1 

r - (r^ - set of rewards associated with each state transition 
The stage-to-stage state transition is governed by the transformation 
rule 

(1,1) 

The reward at stage 1 is 

r i “ r i ( ^L’ ii* W " ; i £ 1* (1 ' 2) 

The DM's information can range from complete knowledge of the event set, 

{e^}, and the set of transformation rules, {t^}, to little or no know¬ 
ledge of these variables. Notice that the transformation rule, t^, and 
the reward,r^, can be stage dependent (i.e., non-stationary). It is 
commonly assumed that the DM chooses his decisions to maximize his 
expected reward over N stages. The horizon N may or may not be known to 
the DM. 

In studying multi-stage decision processes, a distinction is often 

maintained between sequential and dynamic decision processes (see Fig. 

2). In sequential decision problems, the evolution of the state of the 

systems, is independent of the DM's decisions. That is, Eq. (1.1) 

-1 

becomes 


“1+1 " -^i 


(a,, e.) 


(1.3) 











Thus, a sequential decision task is an uncontrolled decision process. 

It consists of a sequence of static decision problems repeated periodi¬ 
cally and independently. The information gained from earlier decisions 
is useful in making later decisions, but the earlier decisions do not 

affect the transformation rule, t,. The operation of a sequential 

—1 

decision process is as follows: given that the system is in state at 
the beginning of a stage i, the DM makes a decision, ji^, the system moves 
to state (which may or may not be identical to s^) according to the 

transformation rule, and the DM receives a reward r^(j3^, as30 “ 

ciated with this transition. Examples of sequential decision tasks are 
system failure detection, revision of opinion, display monitoring, asset 
selling and optional stopping. 

The dynamic decision processes are multi-stage decision tasks in 
which the stage-to-stage changes in the state of the system are directly 
affected by the DM's previous decisions, as well as by environmental 
factors (events) over which the DM exercises no control (see Eq. (1.1)), 
i.e., it is a controlled decision process. The set of alternatives and 
the information available at later stages are contingent upon earlier 
decisions. Thus, the DM has to consider the effect of each of his 
decisions on the future states of the system and, consequently, on his 
future decisions. The dynamic decision processes can be further classi¬ 
fied into two categories, viz., Markovian and semi-Markovian (see Fig. 

2). The Markov decision process (MDP) has the property that the stages 
are of deterministic duration, or their duration is irrelevant to the 
decision problem. Multi-stage betting games, inventory control, search 
theory and resource allocation are examples of MDP. 

The semi-Markov decision process (SMDP), or Markov renewal decision 
process, is characterized by the fact that the time between state tran- 










10 


sitions Is a random variable. The decision epochs In a stationary SMDP 

are the times of state transitions. At a decision epoch 1, the system Is 

In state s,. The DM chooses a feasible decision, d,; the system moves to 
—l —l 

state s,.. after a random holding time, T., according to the transforma- 
— l+i 1 

tion rule; and the DM receives a reward r(s^, d^, T^, s^ + ^), associated 
with this transition. The process continues for finite or infinite time. 

A complete characterization of a semi-Markov decision process includes 
the hexad (S,V,E,T,H, r) where S,V,E,T and r are as defined earlier, and 
H is a holding time function that determines how long the system stays in 
a given state before making a transition to another specified state. The 
process descriptors (S,V,E,T,H, r) can be time dependent. The non-stationa- 
rity of the decision process can enter either in the form of time depend¬ 
ent dimension of the spaces (S,D,E), or in the form of time varying nature 
of the transformation rules, T; the holding time functions H; and the 
reward structure, r. If a process is a non-stationary SMDP, the nota¬ 
tion (S(t), P(t), E(t), T(t), H(t), r(t)) is employed to emphasize its 
time dependence. Here, the decisions are, in general, continuous func¬ 
tions of time. Some examples of SMDP are targeting selection, aiv-traffic 
control, multi-RPV control, industrial process control, power system 
regulation and many other multi-task systems. The analysis of these 
systems is arduous, in view of the non-stationarity of the underlying 
SMDP. Virtually no significant research has been done by behavioral 
decision theorists using semi-Markov decision paradigms. 

1.3 Summary of Research on Behavioral Decision Theory [2] 

A brief and selective overview of the theories of individual choice 
behavior in static and multi-stage decision tasks was provided earlier 
in (2). The primary purpose of this review was to investigate the appli¬ 
cability of this body of knowledge to model human information-processing 
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and decision-making skills in multi-task systems. The main conclusion 
was that the multi-tadk decision problems are more general than any 
considered in behavioral decision theory to date. However, there exist 
bits and pieces of relevant models and a wide range of experimental liter¬ 
ature that may be useful in modeling human behavior in multi-task systems. 
Specifically, the following observations of the review are relevant to 
our discussion. The reader is referred to [2] for additional details. 
1.3.1 Single-stage Decision-making 

Most of the literature on behavioral decision theory is devoted to 
single-stage (static) decision-making under risk. The models of risky 
decision behavior may be characterized by two alternative descriptions 
of the decision task. The first modeling approach, rooted in mathematics 
and economics, describes the decision task in terms of probability dis¬ 
tributions over sets of outcomes (events) with little or no attention 
paid to the underlying psychological processes of the individual DM. 

This approach led to such moment-based models as the Expected Value (EV), 
the Expected Utility (EU), the Subjectively Expected Utility (SEU), and 
the Risk Preference models. The second modeling approach, rooted mainly 
in psychology, characterizes decision tasks in terms of multi-dimensional 
stimuli. It assumes that each stimulus forms a basic risk dimension, and 
that the DM integrates these dimensions into a judgement or decision. 

Thus, this approach led to explanatory models that view decision-making 
under risk as a form of information-processing behavior. 

The dominant moment based model for single-stage decision-making is 
the subjectively expected utility (SEU) model proposed by Edwards (5]. 

In this model, the DM is assumed to maximize the subjectively expected 
utility of an alternative, d, given by 
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SEU(d) p s (e) U[r(e,d)] (1.4) 

eeE 

where p g (e) Is the subjective (perceived) probability of the event, e; 
and U[r(e,d)] is the subjective value (utility) function of the event, e. 

In assessing the potential application of moment related versus 
multi-dimensional stimuli models to static decision-making under risk, the 
following observation was made in [2J: for normative (predictive) pur¬ 
poses, models based on moments can serve as a first approximation or as a 
formal standard against which to compare actual performance. 

1.3.2 Multi-Stage Decision-Making 

The existing literature on multi-stage decision-making problems 
may be grouped under three headings: sequential statistical inference, 
optional stopping and dynamic decision-making. The topic of statistical 
inference is concerned with the information-processing (diagnostic) 
ability of the humans, i.e., the human's ability to assess and revise 
probabilities. The optional stopping problem combines information-pro¬ 
cessing with simple (usually binary) action selection. Finally, the 
existing literature on dynamic decision-making is mainly concerned with 
action (control) selection with very little or no consideration to the 
aspect of information-processing. It should be emphasized that virtually 
no significant research has been done by behavioral decision theorists 
using real-time decision paradigms. 

The literature in the area of sequential probability inference 
shows two different approaches to the modeling problem. The first 
approach, advanced by statisticians and psychologists, employs Bayes' 
rule as a normative representation of how a DM should revise his probabi¬ 
lity estimates in light of new information. This approach led to the 
b-udy of "conservatism" - a suboptimal human behavior that produces 


«*•« 
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posterior probabilities nearer to the prior probabilities than those 
specified by Bayes' rule. The second approach, proposed mainly by psy¬ 
chologists, argues that the human is a selective, sequential information- 
processor with limited capacity and that this leads him to apply simple 
heuristics and cognitive strategies. This approach led to the discovery 
of such judgemental heuristics as representativeness, availability, and 
adjustment and anchoring, which were found to determine probabilistic 
inferences in many tasks. However, these findings can only be described 
in qualitative terms and, as yet, no quantitative descriptive theory based 
on heuristics has emerged. 

The optional stopping problem is related to information-seeking 
ability of the human. In this problem, the DM is provided with an option, 
at each stage of the process, to seek (purchase, sample) one more obser¬ 
vation, or to stop and make the terminal decision. Virtually all the 
models of optional stopping are normative in construct. They were 
developed within the Bayesian framework using the subjectively expected 
loss of the sequential decision process as the minimizing criterion of 
performance. In model-data comparisons, it was found that all the rele¬ 
vant procedural variables (e.g., pay-offs, prior probabilities, etc.) 
strongly influenced the number of observations, but not as much as the 
normative model predicted. It was also found that the optimal expected 
loss was quite insensitive to large deviations in the optimal decision 
policy ("curse of insensitivity”). 

The dynamic decision-making problems have not been studied as 
extensively as the static or sequential decision-making problems. This 
is due, mainly, to their inherent complexity, analytic sophistication 
>uid difficulties in implementing experiments on a computer. Most of the 
dynamic decision paradigms considered to date are taken from other fields 
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such as economics and operations research. Typically, the modeling 
approach begins with a normative construct based on dynamic programming, 
and then includes human limitations and constraints to produce normative- 
descriptive models. A common approach to the derivation of a normative- 
descriptive model is to first compare observed behavior with that pre¬ 
scribed by the normative (truly optimal) model. The discrepancies are 
then interpreted either in terms of limitations on the information-pro- 
cessing capacity or the human's misperception of the task. The limita¬ 
tion on the information-processing capacity can be linked to the DM's 
finite memory, his limited ability to project the effects of his present 
decisions into the future, his limited attention span, loss of decision 
time, misaggregation of data, etc. The limitation due to misperception 
of the task can be handled by postulating non-isomorphic internal models 
and differing subjective and objective cost functionals. The optimal 
decision policy is obtained under these cognitive and perceptual con¬ 
straints, and then compared with the actual behavior. However, at pre¬ 
sent there does not exist a systematic method of identifying the human 
limitations beyond the current psychological knowledge. Moreover, the 
dynamic decision-making models, like those of optional stopping, are 
plagued with the "curse of insensitivity", i.e., optimal expected loss is 
insensitive to large deviations in the optimal decision strategy. 

In assessing the potential application of the existing behavioral 
decision models to the MTDP, we conclude that none of them address the 
real-time decision-making issue of the MTDP. However, there exists a 
rich experimental literature which can provide insights and ideas into 
the nature of human limitations in information-processing and decision¬ 
making contexts. These issues are explored in section 1.6. 
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1.4 Millti-Task. Decision-Making 

Sheridan's work on the optimal allocation of personal presence [6] 
might be thought of as a preliminary step towards human modeling In a • 

multi-task context. In this work, Sheridan was concerned with the dyna¬ 
mic human choice between two alternatives, viz., direct presence by trans¬ 
porting himself from one location to another, or vicarious presence via 
communication. He employed a dynamic programming formulation to obtain 
optimal decisions over the planning horizon, with states being the loca¬ 
tions to be considered. 

Rouse and Greenstein [7] pose the multi-task decision problem in 
terms of event detection and attention allocations. They considered a 
multi-task paradigm in which the subjects are presented with the process 
histories of several dynamic systems, and are instructed to detect process 
failures and react to them as quickly as possible. Rouse and Greenstein 
model human event (failure) detection by generating conditional proba¬ 
bilities of event occurrences, given the observation set, via discriminant 
analysis. The attention allocation problem was formulated in the frame¬ 
work of a single server queueing model with the object of minimizing the 
weighted expected waiting time, i.e., unlike the multi-task decision 
paradigm of our work, the tasks, in Rouse and Greenstein's study, stay in 
the queue until they are acted upon by the DM. They note the application 
of the model to computer-aiding, but the theoretical as well as experi¬ 
mental results are inconclusive. 

Tulga [8] formulated the multi-task decision problem in the frame¬ 
work of a dynamic, deterministic, single machine-sequencing model. In 
Tulga's paradigm, the tasks are represented by rectangles of varying 
height (value density) and width (task duration, processing time). 

Tasks appear randomly in time and position and move at a constant velo- 

_ .___ __ -A 
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city towards a dead-line. The subject's task is to attend to one task 
at a time and thus cause that tasks' width to collapse uniformly and, one 
hopes, to disappear before the task reaches the dead-line. The reward 
earned is the aggregate reduction in the areas of all tasks. Assuming 
stationary task parameters, open-loop feedback optimal (OLFO) decision 
policy was obtained by solving a deterministic optimization problem every 
time a new or expected task arrives, and every time a task is completed. 
Dynamic programming with branch and bound strategies was employed to 
solve the resulting optimization problem. 

The studies of Tulga, and of Rouse and Greenstein are particularly 
germane to the present research as they exemplify two of the most popular 
modeling approaches to the multi-task decision problem (MTDP), viz., 
sequencing (combinatorial) and queueing-theoretic approaches. In section 
1.6, we address at some of the limitations of these two approaches to the 
MTDP and indicate how we have overcome their shortcomings via a semi- 
Markov decision process (SMDP) approach. 

1.5 Experimental Paradigm 

The primary focus of this research effort is on human information¬ 
processing and dynamic decision-making behavior in multi-task situations. 
In order to minimize extraneous complexities, such as intricate task 
structure, resource constraints, etc., we have considered a simple, yet 
realistic, computer controlled experimental set-up shown in Fig. 5. 

This experimental paradigm is a modified version of the one used by Tulga 
[8]. In the experiments, the subjects observe a CRT screen on which 
multiple, concomitant tasks are represented by moving rectangualr bars. 

The bars appear at the left edge of the screen and move at different 
velocities to the right, disappearing upon reaching the right edge. 

Thus, the screen width represents an "opportunity window". In the pre- 
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subjects' response box 



Fig- 5: EXPERIMENTAL APPARATUS 


sent experimental paradigm, there can be, at most, a total of five tasks 
on the CRT screen, with a maximum of one on each line. This number is 
commensurate with the results of Miller [9] on the limitations of human 
information-processing capacity. 

The height (reward, value) of each bar is either one, two or three 
units. The number of dots (l_<m<5) displayed on a bar represents the time 
(in seconds) required to process the task. The subject may process a 
task by holding down the appropriate push-button as in Fig. 5. By pro¬ 
cessing a task successfully, the subject is credited with the correspond¬ 
ing reward (r <_ 1, 2 or 3), and the completed task is eliminated from 
the screen. However, no partial credit is given. 


The above experimental framework retains the essential features of 
the multi-task decision problem in a manageable, yet manipulative, con- 
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text. Using this formulation, the effects of key task variables on 
human decision-processes are studied via the following five experimental 
conditions: 


(i) 

Condition A: 

Equal task velocities. 


(ii) 

Condition B: 

Fixed rewards of 3 units 

for each task. 

(iii) 

Condition C: 

Equal processing times of 

3 sec. for each task 

(iv) 

Condition D: 

Full blown, where none of 

the variables is 


fixed. 


(v) Condition B : Similar to condition B, but parallel monitoring 

y 

is denied. 

In condition B^, the images of all the bars, except the one being 
processed, are blanked from the CRT screen. This prevents subjects from 
monitoring other tasks, and, perhaps, deciding on the next task to be 
acted upon. Thus, the subjects are forced to act in a serial mode under 
this experimental condition. 

Six subjects, all university of Connecticut graduate Engineering 
students, were well-trained on the experimental paradigm. The relation¬ 
ships among the tasks' velocities and processing times were carefully 
chosen as to preclude a perfect score, and to motivate the subjects to 
use a rational sequencing algorithm. In all cases, the subjects were 
instructed to maximize the accumulated reward, and were scored using the 
total score, as well as the percentage of a perfect score. They were 
informed of their scire following each 90 sec. run and were encouraged 
to keep it as high as possible. 

Tn the data-taking runs each subject was presented with eight repli¬ 
cations of each experimental condition, in randomized order. This was 
achieved via a "scrambling technique" that switched tasks among the five 
parallel lines for different runs [10]. The tasks were unscrambled at 
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the time of data analysis. This type of experimental design, when aggre¬ 
gated across subjects, yields ensemble statistics that are indicative of 
the subjects' population. The source of randomness in this design is the 
inter-subject variability. This type of design has the added advantage 
of minimizing artifacts such as the effects of learning. 

The data collected were time-histories for each line i of the sub¬ 
ject's decisions, d^(t); the task completion status, c^(t); and the error 
sequence, e^t). The variables d^(t), c^(t) and e^(t) are binary numbers 
defined by 

I 1 if a subject was processing a task on line i at time t 
d £ (t) - , 

' 0 otherwise (1.5a) 

c i(t) = j 1 if a subject had completed a task on line i by time t 
|o otherwise (1.5b) 

and i 1 if a subject was processing a task on line i at time t, 

e^(t) = \ which can not be successfully completed 

'0 otherwise (1.5c) 

In Eq. (1.5a), i=0 refers to the "do nothing" or monitoring decision. 

The variable c^(t) is set to zero at the end of the opportunity window 
of the present task, before the arrival of the next task in the sequence. 
At a sampling rate of 20/sec., each run yielded 1800 datum points for 
each of the variables recorded. For the same experimental condition, 
the time-histories were ensemble averaged to obtain the decision proba¬ 
bilities, P d ^(t); completion probabilities, P c ^(t); and error probabili¬ 
ties, P e ^(t). The averaging process was first done for each subject, 
and then across subjects to obtain the "grand" averages. The details of 
data analysis are presented in section 3,1. 

1.6 SUMMARY 

In previous sections, we have examined the relevant literature on 
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behavioral decision theory and multi-task decision-making. This overview 
has suggested several limitations of the previous work and possible means 
to overcome them. The following conclusions and comments in this regard 
seem appropriate- 

(i) Statue of behavioral decision theory : Most of the litera¬ 
ture on behavioral decision theory is devoted to single- 
stage decision-making. The existing literature on multi-stage 
decision-making emphasizes either information-processing 
(diagnosis) or action selection. However, any realistic 
multi-task system involves diagnosis as well as dynamic 
(usually real-time) action selection. 

(ii) Normative versus Descriptive models'. Theories of rational 
behavior may be normative or descriptive. The normative 
theory attempts to prescribe how decisions should be made in 
the face of a given situation. The descriptive theory, on 
the other hand, purports to explain how decisions are made 
in a given situation. A review of behavioral decision 
theory [2] shows that normative (prescriptive) models can 
serve as a first approximation to assess human decision be¬ 
havior, or they can be used as a formal standard against 
which to compare actual performance. The model developed in 
this thesis is normative in construct. 

(Hi) Need for good Multi-task paradigms: Experiments in multi¬ 
task decision-making may, by their very nature, become overly 
elaborate and cumbersome. This is especially true when the 
experimenter yields to the natural temptation to simulate 
the "entire scenario", thereby possibly masking trends in the 
resulting data. In summarizing the research on behavioral 
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decision theory, we noted that the discrepancies between a 
normative model and observed behavior can be attributed to 
cognitive (intellectual or information-processing) limita¬ 
tions, misperception of the task and procedural variables. 
Since there exists no systematic method of identifying the 
human limitations beyond current psychological knowledge, 
the multi-task supervisory control decision paradigms should 
be designed to minimize the limitations due to misperception 
of the task and procedural variables. Such an experimental 
paradigm was developed in section 1.5. This paradigm is 
simple, realistic, easy to understand and to administer. 

It retains the essence of the multi-task decision problem 
by presenting the human with a dynamic situation wherein 
tasks of different value, time requirement and deadline 
compete for his attention. Due to its simplicity, the 
paradigm minimizes the possibility of human misperception 
of the tasks. If we can understand and model the behavior 
of well-trained subjects in simple laboratory tasks, then 
perhaps this knowledge may be extended to more complex 
tasks. The ability to repeat laboratory experiments is a 
powerful tool, for it allows us to study intersubject 
differences, the effects of different information, and 
provides us with a measure of variability inherent in 
human's decision process. 

(iv) Curse of insensitivity: Most normative decision models of 
behavioral research are plagued with the "curse of insensi¬ 
tivity": substantial variations in the optimal decision 
policies lead to only a small change in the resulting 
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cost. This problem could have been minimized, to some extent, 
by the proper choice of reward and processing time structures, 
as the discrete format employed in the present experimental 
paradigm. 

(v) Modeling approaches: Queueing and sequencing (combinatorial) 
theoretic approaches [7,8] appear to be the most popular 
modeling approaches to model human decision strategy in a 
MTDP. The main shortcoming of classical queueing theory 
approach is that it is extremely difficult, if not impossible, 
to determine the structure of an optimal strategy in the MTDP, 

as it involves a dynamic, endogeneous, preempt-repeat 

't* 

priority discipline with non-conservative customer (task) 

tf 

and server (human) characteristics. The main advantage of 
this approach is that it can handle stochastic arrivals 
(which are assumed to occur indefinitely into the future), 
and stochastic processing times. That is, the approach can 
incorporate uncertainty in the task characteristics. How¬ 
ever, in many practical applications the task characteristics 
are time dependent and are, to a large extent, predictable. 
Therefore, it is the randomness associated with the decision¬ 
maker that is of primary importance, and the stochastic pro- 
perties of tasks are a second order effect (but not neces¬ 
sarily negligible) . Moreover, the classical queueing theory 
places great emphasis on finding stationary measures of 

This implies that a customer (task) may leave before being served or 
tlie server (human) may refuse to service a low priority customer (task). 

++ 

With moderate complexity, a stationary, state dependent, non-preemptive 
priority policy in non-conservative queueing systems can be determined 
using the tools of dynamic programing. The reader is referred to [11] 
for details. 
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system effectiveness, whereas the dominant issue in real 
systems is the determination of instant to instant human 
decision behavior, while servicing a time dependent demand. 
The combinatorial approaches, on the other hand. Involve 
sequencing a finite number of tasks whose arrival times, 
processing times and dead-lines are known deterministically 
(if random, mean values are used). This approach can not 

I 

handle randomness associated with the decision-maker or the 
task parameters easily. Thus, the incorporation of human 
randomness into the decision strategy is difficult using a 
sequencing theoretic approach. The control and semi-Markov 
decision process approach to modeling the human decision 
strategy in a MTDP, developed in Chapter II and in [10], sub¬ 
sumes the earlier two approaches and can explicitly incor¬ 
porate human limitations. 

(vi) Drawbacks of Tulga'e model : One major drawback of Tulga's 

model is that the fundamental human limitations have not been 
identified. First, it is almost impossible for the human to 
have perfect estimates of the time available and the time re¬ 
quired to process a task. Second, it is well known [12] that 
the hum ans do not respond to the same stimulus in identical 
fashion at different times (due to their limited resolving 
po wer), even when there, are no changes in their information 
or resources. This makes it difficult to validate/invali- 
date the truly normative, sample-path (Monte Carlo) models 
of the type espoused by Tulga. Third, it is also well known 
that the human is a sequential decision-maker with limited 
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information-processing capabilities {13J. Thus, it is 
difficult to Justify normative, combinatorial models based 
on dynamic programming (DP), as they require the specifica¬ 
tion of complete future courses of action before any task is 
acted upon. Moreover, the computational load of the DP 
increases exponentially with the number of tasks to be 
sequenced. On the other hand, if a finite stage DP is 
advocated as a compromise, then the nagging question is how 
to choose the number of stages? The last point is not a 
peculiarity of Tulga's model alone. It applies to all the 
behavioral models employing DP formulation. The present 
dynamic decision model (DDM) overcomes the first two cited 
limitations of Tulga's model by explicitly including human 
randomness in the model, and circumvents the combinatorial 
problem of DP by postulating a myopic (one-stage) decision 
policy. 

(vii) Modes of Model implementation: If the subjects do not come 
from a homogeneous population, in terms of their decision 
performance, then the sample path (Monte Carlo) models of the 
type proposed in [8] make little sense. The DDM developed in 
this thesis can be exercised either in a covariance propaga¬ 
tion mode or in a Monte Carlo (sample-path) mode. The first 
mode gives probabilistic predictions necessary for model-data 
validation. This is done in chapter III. The second mode is 
appropriate for using the model as a decision aid. These 
issues are explored in chapter IV. 








II. ANALYTIC MODEL FOR HUMAN TASK. SEQUENCING 


Our analytic approach to model human decision-making in multi-task 
environments is based on, and will extend, the optimal control model (OCM) 
of Kleinman et al [14-16]. The optimal control model is a general and 
versatile methodology for predicting human response in stochastic, multi- 
variable control tasks. The modeling approach, rooted in modern control 
and estimation theories, is based on the assumption that a well-trained 
and well-motivated human operator behaves in an optimal manner, subject 
to his inherent limitations and constraints, and the perceived task 
objectives. The OCM has been applied successfully in a variety of manual 
control tasks, as well as in tasks that do not involve closed-loop control 
[17-18]. However, all these studies emphasize either the continuous con¬ 
trol function of the human, or his ability to test binary hypotheses. 

They do not address the decision-making/task-sharing roles of the human 
that gain prominence in supervisory control, or in semi-automated sub¬ 
systems of the type discussed in chapter I. 

This chapter extends the conceptual framework of the OCM methodology 
to multi-task situations in which monitoring, information-processing and 
dynamic decision-making (task sequencing) are the operator's main activi¬ 
ties. The basic idea of our modeling approach is to integrate decision- 
directed elements within an OCM-like construct. As with the OCM, the 
approach is normative, in that we attempt to determine what a well-trained 
and well-motivated human operator should do, given the task objectives. 

In the sections below, the key elements of OCM are outlined briefly 
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and the dynamic decision model (DDM) of human task sequencing performance 
emerges. 

2.1 Optimal Control Model of Human Response - An Overview 

2.1.1 Background [14-16] , 

The basic structure of the OCM is shown in Fig. 6 and consists of 
the following elements: 

(i) Perceptual model: The perceptual model translates displayed 
variables, ^(t), into noisy, delayed perceived variables 
Xp(t), which is the information upon which the human bases 
his subsequent estimation, control and/or decision strategies. 

(ii) Human Limitations: The OCM includes time-delay, human ran¬ 
domness, small signal threshold phenomenon, and scanning 
effects in its formulation. The time-delay, T, accounts for 
the internal human delays associated with visual, central pro¬ 
cessing and neuromotor pathways. Human randomness is assumed 
to be manifested as errors in observing/processing displayed 
quantities and in executing intended control movements. Thus, 
observation noise, v^(t) and motor noise, v^(t) are lumped 
representations of controller's central processing and sensory 
randomness. The non-linear threshold in the OCM captures the 
"neglect" phenomenon exhibited by humans when observing small 
stimuli. Finally, the scanning-interference model accounts 
for the fact that the human must allocate monitoring attention 
among the various displays [16]. 

(Hi) Information-Processor: The information-processor consists of 
a Kalman filter and predictor that compensate for the psycho¬ 
physical limitations of the human to generate the "best" 










OPTIMAL CONTROL MODEL OF HUMAN RESPONSE 
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estimate of the (augmented) system state x(t) from the per¬ 
ceived information base. 

(iv) Feedback Gains : The control task requirements are assumed to 
be adequately represented by thd minimization of a quadratic 
cost functional. The operator's commanded control input 
u^t) * -Lx(t), where the feedback gains L minimize the cost 
functional. 

(v) Motor model: The motor model accounts for the bandwidth limi¬ 
tations of the human via the neuromotor dynamics, (T^s+I) \ 
and his inability to generate noise-free control signals via 
the motor noise, v^Ct). 

The Kalman filter-predictor, followed by the feedback gains, repre¬ 
sent the adaptations by which the human operator optimizes his performance 
and compensates for his inherent limitations. In general, these model 
elements depend on the (human's internal characterization of the) system 
dynamics, human limitations, and the task requirements. The Kalman filter 
generates the best estimate of the delayed (augmented) state 

£(t) - E{x(t-x)| 2p (a),a<t} (2.1) 

according to an equation of the form 

£(t) = A £(t) + B u^(t—x) + G f v(t) (2.2) 

where the filter gains, G^, are determined from a matrix Riccati differ¬ 
ential equation. The quantity 

v(t) = ^(t) - C £(t) (2.3) 

is the innovation process and represents the difference between the actual 
and expected observations. Basically, v(t) is the new information that 
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is brought to the filter by ^(t). The predictor generates an estimate 
of the present state, x(t), by projecting £(t) ahead by T seconds to 
compensate for the time-delay. 

The state estimate, x(t), and its associated covariance matrix, E(t), 
form a sufficient statistic for the closed loop man-machine system. In 
other words, the pair |x(t),E(t)| can be used as a basis for determining 
subsequent control/decision strategies. A second quantity of interest in 
the OCM information-processor is the innovation process, v(t), defined in 
Eq. (2.3). When the internal model of the Kalman filter adequately 
represents the controlled element dynamics, the process v(t) is a zero- 
mean, white Gaussian noise process with covariance V^(t) equal to the 
observation noise covariance. However, when the internal model and system 
dynamics are not commensurate, the human's estimate of the system behavior 
deviates from the observed dynamic behavior. These differences produce a 
non-zero mean, correlated innovation process. This property can be used 
to develop models of human failure detection [18], and to investigate the 
effects of training on human performance. 

2.1.2 Elements for Decision-making/Detection 

A key feature of the OCM's information-processor is that it provides 
the statistical characteristics of two important variables: the state 
estimate |x(t),E(t)} ; and the innovation process jv(t), V^(t){ . These, 
in turn, have provided a mechanism for studying selected decision/detec¬ 
tion phenomena in man-machine systems. For example, Levison and Tanner 
[17] studied how well subjects could determine if a signal embedded in 
noise exceeded a given threshold. Their model assumed that the operator 
was an optimal decision-maker in the sense of maximizing the subjectively 
expected utility. For equal penalties on missed detections and false 
alarms, this rule reduces to a Likelihood ratio test, which was implemen- 
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ted using the sufficient statistic |x(t),E(t)| . In another study, Gai 
and Curry [18] used the OCM information-processing submodel to analyze 
failure detection in a simple laboratory task and in an experiment simu¬ 
lating pilot monitoring of an automatic landing system. They considered 
only instrument failures, and modeled the detection process as a sequen¬ 
tial hypothesis test on the mean of the innovations, v(t). 

These studies demonstrate the potential of modem estimation techni¬ 
ques in decision-raaking/detection situations. An important feature of 
the work in [17-18] is that it provides a validation of the Kalman filter- 
predictor submodel in tasks not involving closed-loop control. When these 
validation results are combined with the overall verification of the OCM 
in manual control tasks, the potential of a control-theoretic construct 
for modeling human decision processes emerges. 

2.2 Overview of Modeling Approach 

Our approach to modeling human decision behavior parallels the opti¬ 
mal control model of human response in spirit, but not in form. In the 
OCM, the control and information-processing strategies are separable. 

Once an estimate of the system state is available, the linear feedback 
control law uses this estimate as if it were the true state. Human limi¬ 
tations affect only the quality of (augmented) state estimates. 

This type of separation has been found to be plausible in the present 
dynamic decision model (DDM). For any task i in the opportunity window, 

it is possible to show that T„,(t), the time required to complete task i 

Ri 

starting at time t; and T ai (t), the time available/remaining to work on 
task i at time t, are valid decision state variables. That is, these two 
quantities satisfy the axiomatic definition of a state that it must pro¬ 
vide the complete running summary of past actions (decisions). The joint 
density of the decision states of all tasks in the opportunity window is 
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estimated from the information-processor of the DDM, and provides suffi¬ 
cient information for the decision-process. The statistics of decision 
states, along with the task values, r^(t), and a performance metric, are 
used to compute the decision strategy. By analogy to the control theore¬ 
tic OCM, the values r^(t) play the role of cost functional weights, while 
the decision state variables correspond to system state variables. 

A block diagram of the DDM is shown in Fig. 7. Each of the N tasks 
in the opportunity window is represented by a dynamic subsystem acted on 
by disturbances to account for the non-stationarities in task characteris¬ 
tics. The preceived outputs are delayed, noisy versions of the 

task states jjc^, ( and are contingent upon the monitoring process. The 
preceived outputs are processed to produce the best linear unbiased 
estimates of the task states jx^l » an ^ their associated covariances 
|eJ via a Kalman filter-predictor submodel. The statistics of the task 
states are, in turn, used to determine the first and second order 

statistics of the decision states |T Ri , 0 Ri } and{T ai , ° ai l• The statis¬ 
tics of the decision states, along with the task values, r^Ct); are 
combined to determine the attractiveness measure, M^Ct), of each task in 
the opportunity window. Subsequently, the measures are used to generate 
the probability P„(t) of acting on each of the N tasks and the probabil¬ 
ity P (t) of not acting on any task (or the monitoring probability, 

dU 

P dm< t »‘ 

The next few sections expand briefly on various features of DDM. 

2.3 System Dynamics 

In formulating the multi-task decision problem, it is convenient to 

differentiate among the process state or Markov state, the set of 

task states, x_.; and the. set of decision states, x,,. In the present 
l 1 —til 
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experimental context, the process state, £, is related to the status of 
the CRT display and indicates whether or not a task is present on each of 
the K(*5) lines, K being the system capacity. The task state, x^, de¬ 
scribes the dynamical variables internal to each task i. In the present 
experimental paradigm, the task state consists of the instantaneous posi¬ 
tion and velocity of the bar and the time required to process the task. 
Finally, the decision state, x^, consisting of time available and the 
time required to process task i, is a memoryless functional transformation 
of the task state, These notions are formalized below. 

2.3.1 Process state or Markov state, s 
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Clear! , the decision set V(t) , the set of (N+l) feasible decisions at 
time t, is given by 

V(t) = A(t) + |o| 

Thus, the (N+l) possible decisions at any time t are to attend to one of 
the N tasks in the opportunity window, or do nothing. The set of feasi¬ 
ble decisions is time-varying as a consequence of estimation and actions 
bv the DM, and as a result of the arrival of new tasks with different 
attributes. 

2.3.2 Task State, X T . 

For any task on line i, the time required to complete task i, T 
the position of the bar from the left edge, and the velocity, v^, of 

the bar constitute the task state variables as shown in Fig. 8. 
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The state variable denoting the time required to complete task 

i starting at time t, is action oriented. Its evolution can be charac¬ 
terized by the differential equation 

i Til (t) " f Ri (t) " - d i (t) (2.7) 

where d^(t) is a binary decision variable given by 

! 1 if the decision is to act on task i at time t 
0 otherwise 


Since the human can act on only one task at any given time, we have the 
following constraints on the decision variables: 

d^t) * 1 implies dj(t) - 0 i^j; i, j e P(t) 

where d Q (t) » 1 refers to the "do nothing" or monitoring decision.^ 

The remaining task state variables, representing the position and velo¬ 
city of the bar, are given by 


x. ri2 (0 * l ± (t) - x Ti3 ( t > 

x^ 13 (t) » v t (t) - w i (t) 


( 2 . 8 ) 


where w^(t) is a zero-mean, white Gaussian noise with variance W^(t) 
that accounts for (perceived) non-stationarities in task velocity. 

In vector-matrix form, the dynamics of the task state can be represented 


Note that the defining differential equation for T R; , assumes a preempt- 
resume processing discipline, while the experimental paradigm was de¬ 
signed to operate in a preempt-repeat mode. The form of Eq (2.7) was 
chosen after examining the experimental data, which showed that the 
human seldom preempted a task in all the experimental conditions studied. 
However, it is straightforward to include the effects of a preempt-repeat 
mode of processing by reinitializing the dynamical equation for T R ^, every- 
time d A (t) switches from 1 to 0 and T R ^(t) is non-zero. 
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by 

25 ri (t) - A 35^(0 +b Wl (t) -^d t (t); ieA(t) ( 2 . 9 ) 

where 

■^Ti “ ^*Til**Ti2 ,X Ti3^ " 



/ 


The subsystem state in Eq (2.9) is reinitialized to the new task attri¬ 
butes everytime a new task arrives on line i. 


2.3.3 Decision State, x 


The decision state, x^ * (T^, T ^], is related to the task state 
via a functional transformation as 


* IlipiCt)] 

In the present experimental context, the time required to complete task i 
starting at time t, T^(t)• is given by 

T R i(t) - x dll ( c ) - (2.10) 

The other decision state variable T fli (t), the time available to work on 
task i at time t, is related to the task state x^ via 


T ai (t) " X di2 (t) 


L - kj, 12 (t) L - l ± { t) 
*Ti3 (c) " v i (t > 


( 2 . 11 ) 


where L is the length of the opportunity window (~ 12")• 


In the present experimental paradigm, T .(t) is assumed to be Independent 

ai 

of T Ri (t). This is not a restrictive assumption. If the nature of rela¬ 
tionship between T ai (t) and T R ^(t) is known, it can be Incorporated into 
the model formulation in a straightforward manner. 


% 
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2.3.4 Preceptual Model 

Since the processing times are quantized in steps of 1 sec., the 
displayed information consists of a modified version of the task state, 
x^. Thus, 


y t (t) - 


r<3 [T R1 <t>n 

i ± (t) 

L v t (t) 


- 5^(0 + & V (t) 


( 2 . 12 ) 


where v^(t), the linearized quantization error, is bounded by 


-0.5 < v (t) < 0.5 (2.13) 

— q — 

In order to represent the effects of quantization on the estimation pro¬ 
cess, it is frequently assumed [19] that v^(t) is uncorrelated with 
Tri( t), and that it is a stationary, zero-mean, white noise process uni¬ 
formly distributed over the range of quantization error of Eq (2.13). 

The autocovariance of the noise process v (t) can be shown to be 

<1 

E [ v q (t) v q (0) ] " V q «(t-0) " J2 6(t ' 0) (2 - U) 


Following usual practice, the human is assumed to perceive a noisy, 
delayed and linearized replica of ^(t) given by 


where 


i pi ( t > " ^(t-T). + Uy i (t—t) 

T - the human's time delay (2 .2 sec) 


(2.15) 


Y^(t) - the observation noise at time t 

The observation noise v^(t) is a zero-mean, white Gaussian noise process 
with diagonal covariance matrix V As with the OCM, the diagonal 
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elements of the observation noise covariance matrix associated with the 
task position and velocity are functionally related to the monitoring 
strategy and the mean-square values of the corresponding output variables 
according to 


where 

y^ ■ j th element of the vector j=l,2,3 

* noise to signal ratio (NSR) associated with ^ of task 

i (~ .01) 

f^(t) ■ monitoring allocation to task i 

There is assumed to be no intratask attention allocation among the indi¬ 
vidual components of the displayed variables, y^, as the task information 
is presented in an integrated form. Thus, f^(t) is the monitoring atten¬ 
tion to task i. Also since the (linearized) quantization error over¬ 


shadows the inherent human randomness in perceiving the decision state 


variable, T Ri (t), the observation noise covariance 


can be neg¬ 


lected in comparison to V^. In summary, the time-histories of y ^(t) are 
the stimuli upon which the human bases his subsequent estimation and 


decision strategies. 


2.4 Monitoring Strategy 

The monitoring allocations, f^(t), affect the subsequent decision 
strategy. On the other hand, the specifics of the experimental paradigm 
determine whether or not the monitoring strategy is dependent on the 


decision strategy. In the present experimental context, if a task i is 


acted upon at time t (i.e., d^(t)«l), it is also monitored. However, 
there exist two possibilities for the other tasks j?<i: 
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(i) ’Parallel monitoring: In this case, all tasks, Including the 
one being acted upon, can be monitored simultaneously. (This 
corresponds to experimental conditions A-D). Here, an equal 
(monitoring) attention allocation strategy, i.e., f i (t)-^, is 
found to be adequate for model applications. This result is 
not surprising, since an overview on the existing monitoring 
models [ 10 , 16 ] indicates that the overall system performance 
is not very sensitive to changes in the monitoring process 
over a reasonable range of variation about the optimal strat¬ 
egy, at least for well-designed displays. 

(ii) No Parallel monitoring: In this case, tasks, other than that 
being acted upon, are not available for monitoring (experi¬ 
mental condition B y ), but monitoring of all tasks is an 
explicit decision alternative. Here, the monitoring process 
is strongly coupled to the decision strategy. Noting that 
f ± (t) is the ensemble probability of monitoring task i at 
time t, we have by the total probability rule 


f^(t) • P (monitor task i at time t} 

■ P (monitor i, act on j) 


JeP(t) 


* P (monitor ijact on j) • P^j(t) 

JeP(t) 


P, ra (t) 

* p di (c) + _2 1— ; 1 e A(t) <2.17) 

where it is assumed that the monitoring probability, P^m^’ 
is equally distributed among N tasks. 
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2.5 Information - Processor 

The information-processor compensates for the human's inherent ran¬ 
domness, time-delay and monitoring allocations to produce the "best" 
estimate of the decision state from the perceived information base. As 
with the OCM, the information-processor consists of a Kalman filter and 
a linear predictor. This choice was motivated by the results of [17-18], 
which provided an independent verification of the filter-predictor struc¬ 
ture for the information-processor in situations not involving closed- 
loop control. The Kalman filter-predictor submodel generates the best 
linear unbiased estimates of the task state, x Ti (t) and its associated 
covariance matrix, E^(t). The pairs (x Ti (t), E^t)} are subsequently 
used to compute the first and second order statistics of the decision 
state, x dl (t), viz., the pairs 0 Ri (t)} and {^(t), Ct)} for 

each task i. 

2.5.1 Kalman Filter 

The Kalman filter generates the best linear unbiased estimate of 
the delayed state 

£j(t) = E jx Ti (t-T)/ ipl (CT) ; a < tj 
according to an equation of the form 

IjCt) = A £^(t) - & d 1 (t-x) + G i (t) [^ ji (t) - £ t (t) ] (2.18) 

with the initial condition £^( t Q i + T) - lT Ri (t 0i ), °» v i^ t oi^ * Here 
t Ql is the initial (arrival or ready) time of a task on line i, and 
f r£(1 0 1) is the a priori mean of the processing time. 

The filter gains G^(t) are given by 

-1 

G t (t) = >: t (t) f v yi (t-T) + £ V q ] (2.19) 




j _ _||WW 
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where E.(t) is generated from the usual Riccati equation 

E. = A E. + E, A’ - E. [V (t—T) + g V g']" 1 Z 4 + b W,(t-T) b' (2.20) 
1 li l yi q — i — i ~ 


with the initial condition 

(T 


Z i (t 0i + T) = dla 8 


- T ) 

RH RL „ nl ,, 2, . 

12 , 0 , .01 tt v 1 (t Qi ) 


where T and T are the a priori maximum and minimum values of the 
Rli RL 

processing time. The initial uncertainty in the velocity estimation is 
assumed to scale with the square of the velocity in accordance with the 
Weber's law. 

2.5.2 Linear Predictor 

Prediction of the present task state, x,j,. (t), is obtained by inte¬ 
grating the vector-matrix linear differential equation 


x Tj (o) = A x^fa) ~ & <^(0) 


( 2 . 21 ) 


from o = t-l to a = t with the initial condition x^tt-t) = £^(t). 

/v 

The error covariance associated with the task state estimate x Ti (t)> 
denoted by F..(t), is given by 


E. (t) 


At 

e L. 

l 


A ' T 


A(t-a) 


b W t (0) 


b' 


A'(t-O) 


da 


( 2 . 22 ) 


t-'t 


2.5.3 Statistics of the decision state, x,. 

- ---— —=d l 

The statistics of the decision state variable T (t) are readily 

Kl 

computed from those of the task state, sc, (t), as 




conditional mean = *Til^ 
conditional variance = Ej^(t) 


(2.23) 
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The human's perception of the conditional density of the decision state 

A 

variable T (t) is assumed to be Gaussian with mean T (t) and variance 
Ri Ki 

In order to compute the statistics of the remaining decision state 
variable T a ^(t), we note from Eq (2.11) that it involves the ratio of two 
Gaussian random variables. If the observation signal-to-noise ratio (SNR) 
is sufficiently high^, then it can be shown [20] that T a ^(t) is approxi¬ 
mately a Gaussian random variable. An unbiased estimate of T &i (t) and 
its variance can be evaluated by linearizing Eq (2.11) about the condi- 

A A 

tional unbiased estimates £^(t) and v^(t) as 


L-e.(t) L--e i (t)-[£ i (t)-£ 1 (t)] 

T ai^ v.(t) ~ v i (t)+[v 1 (t)-v 1 (t)] 


L-2 i (t) £ i (t)-^.(t) [L4 i (t)][v i (t)-v i (t)] 


0 l( t) 


v ± (t) 


vf(t) 


Using Eq (2.24), we have 


(2.24) 


T ai (t > 




= Conditional mean 


L-t^t) 

Vj(t) 


= Conditional variance 


(2.25) 


Ej22( t )+ E i33(t)Taj(t)+ 2 Ei23(t) T a i (t) 

v*(t) 


Due to the scaling nature of the noise processes in the information 
2 

r- v 

SNR » 10 £og 1n -zz - should be (approximately) greater than 12 db. This 

10 lh i33 

condition is almost always satisfied in man-machine applications. 


I ■ !!■> 11 W\ 
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A 2 

processor, one might expect that ^(t); E i23 ^e 2 £ i (t)v i (t); 

and E 133 Ze 3 where e 1 » e 2 * e 3 > °* Therefore, Eq (2.25) implies, 

albeit heuristically, that 

°al (t) = £ T L (t) ; E > 0 

Thus, the standard deviation of time available, ° al (t)> is likely to scale 

A 

with its conditional mean, T ai (t). This is intuitively appealing. 

In summary, the decision state variables, x,. = [T_., T .of the 

-ndi Ri ai * 

ODM are assumed to be normal with the non-stationary perceived density 

and distribution functions, y.(T_.;t), T.(T ;t) and <P.(T ,;t), $ (T .;t) 

i k .1 i ki 1 ai i ai 

respectively. That is, 

Yl (T R1 ;t) - N[T R1 (t); 

(2.26) 

The conditional Gaussian statistics of the decision state from an impor¬ 
tant input to the decision process as shown in Fig. 7. 

2.6 Decision Strategy 

In this section, the multi-task decision problem is formulated in 
the framework of a non-stationary, semi-Markov decision process (SMDP). 

Via this formulation, the combined statistics of the decision states of 
N tasks are used to compute the transition probabilities among the 
various process states for each of the decision alternatives. The transi¬ 
tion probabilities, along with the task values, are used to determine the 
attractiveness measures of tasks, employing the subjectively expected 
value (SEV) as a criterion of performance. These measures form an input 
to a stochastic choice model that generates the decision probabilities. 

The decision process is depicted in Fig. 9, and is elaborated next. 


asm 


'I- 4 ' 




m » 











St 



V 




SMDP 

Transition 

SEV 

{M.} 

Stochasti 


* 

Criterion 

-- 

Choice 

Model 

— s 

Formulation 

Probabilities 

-^ 

Attractiveness 

-T 




Measures 



Fig. 9 : HUMAN DECISION PROCESS 


2.6.1 Semi-Markov Decision Process Formulation 

Recall that a non-statlonary SMDP is characterized by the hexad 
(S(t), P(t), E(t), T(t), H( t), r(t)), where 

S(t) = set of process (Markov) states of the system (state space) 
V(t) = set of possible decisions (action set) 

E(t) = set of events (event set) 

T(t) = set of transformation rules that describe the changes in 
the state. This is usually expressed in terms of transi¬ 
tion probabilities. 

H(t) = Holding time function that determines how long the system 
stays in a given state before making transition to another 
specified state. This is expressed in terras of holding time 
density functions. 

r(t) = set of rewards associated with each state transition (reward 
structure). 


Thus, in order to formulate the multi-task decision problem as an SMDP, 
we need to specify the process descriptors S(t), E(t), T(t), H(t), r(t) 













The state space S is time invariant and consists of 2 elements 
corresponding to 2 possible realizations of the process state £, where 
K is the system capacity. Symbolically, 

£ 

S = {set of 2 process states, £} 

Associated with a process state £ at time t, there exist N pairs of 
decision state variables {T (t), T .(t)}, ieA(t). Here, A(t) is the 

SI 

set of N available tasks in the opportunity window at time t. 

B. Event set, 5(t) and the Transformation Rule, T(t) 

The transformation rule (or the law of motion), T(t), is expressed 

in terms of transition probabilities p^ ,(t), where s is the process 

ss — 

state at time t, _s' is the destination process state after a random holding 

time T in process state £, and i £ A(t) denotes the action on a task i in 

the opportunity window. The destination state s/ depends on the values of 

(N+l) independent random variables, T„.(t) and T (t), m £ A(t); associ- 

R1 am 

ated with the process state £ and the decision to act on task ? at time 
t. It is clear that a decision to act on task i results in one of the 
following process state transitions shown in Fig. 10. 

(i) Successful Completion or loss of task i : The task i is said 
to be successfully completed if the random variable T R ^(t) 

Note that this formulation assumes complete ignorance of the random 
variables associated with the future arrivals on the (K-N) empty lines. 
That is, transitions to process states corresponding to arrivals on 
empty lines are not included in this formulation. This implies that the 
decision strategy depends only on the characteristics of tasks in the 
opportunity window. If the probabilistic information regarding future 
arrivals is available, it can be incorporated into the decision strategy. 
The reader is referred to Ref. [10] for details. 











successful completloi 
or loss of task 1 



time t+x 


Fig. 10 : PROCESS STATE TRANSITION DIAGRAM OF THE MTDP 

of task i is greater than zero, but is less than the available 
times, T (t), of all the tasks, including i, in the opportu- 

flIP 

nity window. On the other hand, task i is said to be lost if 
the random variable T a ^(t) is greater than zero, but less 
than T R ^(t) and T^Ct), j i 4 i. In any case, the new process 
state s' - s - e,, where e, is a K-dimensional unit row vector 
whose i th component is one and whose other components are 
zero. 

) Lose of a task j / i: This event occurs if T & j(t) is greater 

than zero, but less than T„.(t) and T (t), m + j. When this 

am 
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Thus, the destination process state, jj*, and the random holding time, 
T, depend on the outcome of a race among the (N+l) competing, non-station- 
ary random processes T R± (t) and I^t), m e A(t) associated with the pro¬ 
cess state s_ and action i. This type of semi-Markov decision process, 
wherein the state transitions are determined by a race among several 
random processes is known as a "competing semi-Markov decision process" 
[21], It should be emphasized that the analysis of MTDP is complicated 
by the fact that the transition probabilities and the random holding time 
functions are non-stationary. 

It is clear from the above event description that there are N 
possible process state transitions of interest from state s^ In general, 
the destination process state s_* “ £ - e^, m £ A(t). In the following, 
the transition probabilities for the admissible destination process 
states are computed. We suppress the time dependence of the density and 
distribution functions of the decision states for ease of notation. 
la) Probability of event (i ): This is the probability that the new 
process state s/ = s^ - e^ given that the present process state is and 
the decision is to act on task i. Thus, 


p ss .(t) = n i (t) + w^ct) ; s.' = i ~ 


(2.27) 


where 


n^(t) = P{action on task i, task i is successfully completed. 


other channels intact) 


n P(T_, < T } 

Ri — am 

m e A(t) 


oo 

! u 

o m i A(t) 


[1-$ (T)] y (T) dx 
m l 


w^(t) ■ Pfactlon on task i, task i is lost, other channels intact) 


tv" 
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n 


= PlT < T } • II P{T < T } 

ai Ri me A(t) ai ' 80 


m f i 


= f„-r iW , .n 

J 1 J 


[i-^CT)] . 4> (T) dT 

m e ACt) m 1 

m ^ i 


(b) Probabilities associated with event (ii): This is the probability 
that the new state £>' = js - e ; j ^ i, given that the present state is 
s_ and decision is to act on task i. Therefore, 


p ss - (t) = U, ij (t) ; *' - * - i it*- 


(2.28) 


where 


u 'ij(t) * Piactlon on task i, an accessible task j other than i is 
lost, all the other tasks intact) 

I”! P{T . < T } 

m e A(t) aj am 

m ^ j 


pIt . < T n .) 
aj - Ri 


j u-^ct)] • n 


[l-t (T) ] 4>. (i) dr 

o me A(r) m J 

m t j 

In summary, the N transition probabilities for each i t A(t) are 


| ^(t) + w ii (t) ; s' * £ - 
l t) 5 s’ = s - ej ; j j* i 


(2.29) 


in Ref [10], numerical quadrature formulae of Hermite [22], and Steen, 
Byrne and Celbard [23] were suggested as a means to compute the required 
transition probabilities. However, the computation of transition proba¬ 
bilities can be greatly simplified using Luce's choice axiom [24-27], 
whicli is ideally suited to determine the probability that a certain 


1 
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random variable is the minimum (or maximum) among a set of random varia¬ 
bles. This is precisely the problem of interest in generating the transi¬ 
tion probabilities. For example, ri^t), the probability that T Ri (t) is 

less than T (t); m t: A(t), can be computed via Luce's choice axiom 
am 

according to 


n,<t) 


y p ft am (t)-T R1 (t) < o) 

m £ A(t) 


(2.30a) 


The main assumption underlying Luce's choice axiom is that the removal of 
some alternatives (random variable, in our case) does not alter the rela¬ 
tive probabilities of choice among the remaining alternatives. In other 
words, the presence or absence of an alternative is irrelevant to the 
relative probabilities of choice among the remaining alternatives, al¬ 
though the individual probabilities will generally be affected. The proof 
of the form of Eq (2.30a) is included in Appendix A. 

Since the decision state variables are assumed to be Gaussian, Eq 
(2.30a) simplifies to 


whe re 


n.(t) 



* E 

m c A(t) 


1 + Erf (A. ) 1 

_ lin 

1 - Erf (4 i„> 


4 


im 



T 

am 


+ 0 


am 


and Erf(a) 


_2 

Af 



0 


Erf (°°) =1 


(2.30b) 


Using a well known result [24] that the logistic function is a good ap- 
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proximation to the cumulative normal, Eq (2.28b) can be further simpli¬ 
fied as 


n 


i 


1 + 


£ 

m £ A(t) 


{l + exp (A im )} ‘ 
{1 + exp (-A^)} 


-1 


(2.30c) 


The computation of remaining transition probdbilites , m £ A(t) pro¬ 
ceeds along similar lines to Eq (2.30). 

C. Holding time function. H(t) 

The holding time function is specified in terms of holding time 
density functions, h* s ,(r), which determine how long the system 
stays in process state :s before making a transition to a specified state 
s'. The density function h gg ,(T) can be obtained by first determining 
the joint probability - probability density function, f ,(i), f° r the 
event that a system in process state s^ will make its next transition to 
process state s_' after a holding time T, while acting on a task i £ A(t). 
This event will occur in the competing SMDP only if the random variable 
representing the destination process state £»' takes on the value T and all 
the other N random variables are greater than T. Therefore, 




v t> - n [i " t « (T>i ■ 

m £ A(t) 
m ^ i 

• jY 1 (T)[l-$ i (T)]+4> 1 (T)[l-r i (T)]}; s' = s - e. 


g iJ (T) - d> (D • U-r i (T)]. 

J”"J [l-* m (T)J; s' * s - e^ ; j 4 i 

m £ A(t) 

® 1* J 


(2.31) 


* 
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Note that the transition probabilities, P 1 , (x), of Eq (2.27) are related 

8 S 

to f ,(x) via 


P ss'( T ) ° J dT for a11 S* •§.’ e S 
0 


The holding time density functions, h ss ,(x) are given by 

i 4' (t) 

h , (x) = —- for all allowable £, js' 

88 P„.CO 

ss 


(2.32) 


It should be emphasized that the functions f^ ,(x) and i(x) are non¬ 
stationary. Closed form expressions for the holding time density func¬ 
tions are not possible and, hence, must be computed numerically 110,22,23]. 
D. Reward Structure. r(t) 

When the transition from process state t> to process state s^' occurs 
at some time t + X, the DM earns an expected reward r* ,(x) in the form 

SS 

of a bonus. That is, the DM earns a lump sum payment at the time of state 
transition, a payment that depends on process states s, s/; the holding 
time x, and action i. In the present MTDP, a reward ("bonus") of r^(t) 
units is earned while acting on task i if and only if the new process 
state s' = s_ - e^ and the task i is successfully completed. The condi¬ 
tional probability that task i is successfully completed, given that the 
new process state s/ = £ - and action on task i, is n^O/In^t) - ^^ (t) ]. 

In addition, if it is assumed that there is a penalty of q (t) units for 

m 

losing a task m e A(t), then the reward structure can be described by 



[r (t)n.(t) - q,(t)u) (t)] 

—1___i-ii- . s' = s - e 

tn t (t) + w u (t) ] ’ - - 

(2.33) 

- q (t) ; s’ - s. - e ; m i* A(t) 
m — — -m 
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It should be noted that r* g , (t) is the conditional expected reward, given 
the holding time. Even though there are no penalties for missed tasks in 
the present MIDP, q m (t) of Eq (2.33) could still represent the subjective 
losses (utilities) assigned by the DM. A logical choice for the subjec¬ 
tive values q (t) are the objective rewards r (t). The reward structure 
m m 

of Eq (2.33) can be generalized to include decision dependent penalties, 
as well as a continuous yield rate [10]. 

E. Action Set, V(t) 

At any time t, the DM is provided with (N+l) choices: act on one of 
the N tasks in the accessible set A(t) or not act on any task (i.e., do 
nothing or monitor). Thus, we have 

V(t) = A(t) + {0} 

The number of choices may differ from one process state to another. Some 
process states may have only one alternative and, therefore, choice is 
constrained whenever such a process state is occupied. The DM's problem 
is to select the actions (over time) that will make the operation of the 
system most rewarding. 

2.6.2 Attractiveness Measures. M^(t) 

The basic assumption underlying the human response modeling is that 
a well-trained human behaves in a normative, rational manner subject to 
his inherent limitations. We interpret this, mathematically, in terms of 
maximizing a specified metric. As with the OCM, the choice of a metric 
may be either objective (specified by the experimenter), or subjective 
(adopted by the human in perofrming and relating to the task). In the 
present experimental context, the objective metric involves the maximiza¬ 
tion of reward earned. Since the proposed model is normative in construct, 
we need to specify a subjective metric. If the subjective metric is the 
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same as the objective metric, then, as shown in [10], a functional equa¬ 
tion for the optimal decision strategy can be derived using dynamic pro¬ 
gramming (DP) and semi-Markov decision process theory. However, the 
tree-folding back procedure of the DP presents serious computational 
difficulties ("curse of dimensionality"), and requires the evaluation and 
specification of all future courses of action before any task is acted 
upon. The latter point is at variance with the current psychological 
knowledge of a human's inability to forsee the complete future effects of 
his present decisions. If a finite stage DP is advocated as a compro¬ 
mise, we are faced with the dilemma of selecting the number of stages. 
These observations led us to the choice of the subjectively expected 
value (SEV) of a decision as our metric (or "attractiveness measure") for 
optimization. It is easy to show [10] that SEV corresponds to a myopic 
(one-stage) policy, which can be derived from the DP formulation by 
completely disregarding future rewards. That is, the myopic decision 
policy acts at every time t, as though the present decision was the 
final one. Conceptually, this approach is similar to the "open-loop- 
feedback-optimal" approach of control theory, wherein the present value 
of future information is neglected.^ 

The attractiveness measure M^(t) of a decision to act on task i 


+ 

The DP formulation of the optimal strategy is of theoretical importance 
in its own right, as it provides a general and flexible analytic frame¬ 
work for the analysis of dynamic decision-making under uncertainty. 

This framework covers all cases where the present decisions can affect 
future information, uncertainties associated with the random processes 
of the system, future rewards and future actions. More importantly, it 
was shown in [10] that the optimal decision strategy subsumes Tulga's 
deterministic, dynamic sequencing formulation of the MTDP [8], as well 
as the Markov decision problem [21J, and several single processor 
sequencing theoretic rules [28]. The Markov decision formulation was 
applied and extended in [11] to determine stationary, non-preemptive 
priority policies in a multi-class queueing system with finite capacity 
and reneging (i.e., impatient customers). 











is simply the subjectively expected discounted value of reward from the 


first transition out of process state si regardless of when it occurs. 
It is given by 


M 1 (t) 


E 

all s' 


I 

0 


i 

r , 
ss 


CO h 


ss 


(T) dT for all i £ A(t) 


The use of discount factor, a in the computation of attractiveness meas¬ 
ures, M^(t), may be interpreted in two ways: First, it can account for 
the DM's present perception of future rewards. That is, future rewards 
are worth less at the present time ("reward today is sweeter than reward 
tomorrow"). Howard [29] calls this explanation for a, the "time prefer¬ 
ence" or the "greed-impatience trade off". The second interpretation is 
in terms of the uncertainty associated with the duration of the period 
during which rewards can be earned. 

For the specific MTDP, using Eqs (2.29), (2.32) and (2.33), M^(t) 
can be rewritten as 

M i (t)=[r 1 (t)n 1 (t)- q i(t)u)ii(t)3g ± (a; t )- q m (t)u) im (t)e im (a;C) 

m e A(t) 

m i 4 i 

for all i £ A(t) (2.34) 


where B. (a;t) and 6. (a;t) are the exponential (Laplace) transforms of 
l im 

the holding time density functions b^(0/[n^(t) + w^t)] and g im ( T )» 
respectively. They are given by 


' ^( - ty^ Tr K“ T b i (T > dT 1 ! i <0;t) ’ 1 


and 


00 

I e ’ ar s<J T > dT ; - i. » * 1 
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The attractiveness measure associated with the "do nothing" decision, 

M (t), or that of monitoring decision, H (t), depends on whether or not 
u m 

parallel monitoring is allowed. 

Ci) Parallel monitoring'. When parallel monitoring is allowed, 

Mg(t) can be interpreted as the human's indifference towards, 
or perception of, small rewards. In the present context, the 
"do nothing" decision is made only if none of the available 
tasks can be completed, or if there are no tasks to be pro¬ 
cessed. We use 


H , » <t) “o» (t) e o.< 0,t > «- 35) 

m e A(t) 

where w. (t) and S. (a;t) are computed using a constant 
Dm Um 

"fictitious" processing time for the null task, T r q. Thus, 
Mg(t) represents the loss due to disappearance of all tasks. 
The value of T or . is chosen to match the data, but is a con- 
stant across experimental conditions (A-D). 

(ii) No Parallel monitoring : In this case, monitoring of tasks 
other than the one being acted upon is not allowed (i.e., 
condition B^), but monitoring is a separate valid decision. 
Here, we postulate that the human makes this decision only 
if the enhanced knowledge of the task characteristics off¬ 
sets any reward he may have gained by acting on one of the N 

tasks. That is, M (t) is the average value of gathering in- 
m 

formation for 6 sec (inegration time step) starting at time 
t, and is given by 


M(t) 

m 


^ [M i (t+5)-M 1 (t)] 

i c A(t) 


amasai 


1 

N 


(2.36) 
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Thus, the monitoring decision is invoked only if the infor¬ 
mation value is sufficiently high to preclude action on one 
of the tasks in the opportunity window. The attractiveness 

measure M (t), in conjunction with the measures M J (t), is 
m 1 

used to compute the monitoring probability, P, (t). 

“ ' am 

The form of Eqs. (2.34-2.36) for attractiveness measures is partic¬ 
ularly appealing, as it relates to the "net gain" of each of the task 
alternatives available to the decision-maker at time t. The first term 
in Eq (2.34) represents the "potential gain" of acting on task i at time 
t, whereas the summation term represents the "potential loss" due to the 
disappearance of all the other tasks. The criterion explicitly considers 
the human's inability to envisage all the future courses of actions, as 
would be required by DP formulation. Moreover, Eq (2.34) includes 
human's preference for rewards that are distributed in time via the dis¬ 
count factor, a. 

Sensitivity analysis of the DDM (chapter III) has shown that a 
value of a = 0 gives the best possible match to the data. This could 
imply either of two things: First, humans do not discount rewards 
distributed over a short-time horizon (one to five seconds in our case). 

A second and more plausible implication is that the use of discount 
factor in the analysis of dynamic decision-making may be artificial. 

That is to 9ay, once the human information-processing limitations are 
included and a myopic policy is postulated for the human decision strat¬ 
egy, it may not be necessary to employ discount factor, a. In any case, 
when at is zero, Eq (2.34) simplifies to 

M i(t) - r 1 (t)n i (t) - ^ 

m e A(t) 


; i e A(t) 


(2.37) 
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Thus, there is no need to numerically evaluate the holding time density 
functions. Note that Eq (2.37) is similar to the SEU model of Eq (1.4) 
with appropriate interpretation. 

In summary, the proposed myopic decision strategy in the general 
case, but with a = 0, involves the computation of only 2N0H-1) transi¬ 
tion probabilities to evaluate the (N+l) attractiveness measures, M (t) 

m 

and M i (t), i £ A(t). The required transition probabilities may be com¬ 
puted in a straightforward manner via Luce’s choice axiom. Therefore, 
the computational load of the proposed decision strategy is insignificant 
compared to that of the truly optimal DP formulation. 

2.6.3 Stochastic Choice Model 

A decision model that selects the task with maximum attractiveness 
measure yields a (1-0) response, and suggests that the decision-maker 
would always make the same sequence of decisions under similar condi¬ 
tions. However, it is well known [12] that people fluctuate in their 
response to the same stimulus, even when there are no changes in their 
information or resources. Fluctuations in choice can arise because the 
subject is unable to discriminate precisely, or because he may make 
calculating, response or perceptual errors. The stochastic choice models 
assume that, although the attractiveness measures, M^(t), could be 
characterized by a single fixed number, the subjects perceive it as a 
random variable, ft^(t), with some distribution (usually Gaussian). The 
randomness may be interpreted in terms of the uncertainties associated 
with the human perception of task values, r^(t). Below, we again invoke 

Luce’s choice axiom to compute the decision probabilities, P,.(t): 

dl 

(i) Parallel monitoring: 
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P di (t) 


1 + 


P{fl k (t)-ft i (t) > 0} 


k 0 D(t) pffl i< t >A <t) =•°> 

k / i 


-1 


; i e V(t) 

(2.38) 


(ii) No parallel monitoring: 


p d » (t) 


X + 


k e A(t) 


P{fl k (t)-fi m (t) > 0} 
P(fl m (t)-M k (t) > 0} 


T -1 


(2.39) 


The decision probabilities P^ are given by a relation similar 
to Eq (2.38) with M (t) replacing K (t). 

ID O 

In Eqs (2.38-2.39), we assume that fi^(t) are Gaussian random 

2 

variables with mean M.(t) and variance o (t) that scales 

l Ml 

with M^(t). That is, 

0 Mi (t) = c|M 1 (t)| (c s .2-.4) (2.40) 


where c is the co-efficient of variation. Note that the 
forms of Eqs (2.38-39) can be employed with any decision 
strategy. 


2.7 Model Predictions 

The dynamic decision model can be used in a straightforward manner 
to generate predictions of as well as of other response measures 

that can be computed from the experimental data: 

(i) The completion probability, P (t) is the probability that 
task i is completed by time t. Thus, 

P cl (t) = P{T R1 (t) < 0} = r i (0;t) ; i e A(t) (2.41) 


When P c ^(t) > .99, the task is assumed to be successfully 
completed and, therefore, is removed from the model. 
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(ii) The error probability, P g (t), is the probability that the 

human commits an error, i.e., starts acting on a task he can 
not possibly complete. Thus, P fi (t) is the Siam over all tasks 
of the probability of the joint event: action on task i and 
the time required to complete task i is greater than the time 


available to work on it. Therefore, 

* X P(T Rl <t) ' I S l< t > > 0) • P dl lt) 

i e A(t) 


(2.42a) 


Since T^(t) and T^(t) are assumed to be independent and 
conditionally Gaussian random variables, Eq (2.42a) becomes 


P e (t) 


X [■ 

m £ A(t) 


1-Erf (A li )- ) 

2 J 


‘ P di (t) 


(2.42b) 


where A^ and Erf (A^) are defined following Eq (2.30b) 
(Hi) The average accumulated reward, R(t), is the average total 
reward earned upto the present time t. It is an overall 
response measure, and is given by 


r dP (a) 

J 2 r i (t> ^ ( °’ * da- } d ° 

0 i e A(t) 


(2.43) 


Civ) Normalized incremental reward, W (t) is the average instan- 

c 

taneous reward-earning rate, and is a measure of instantaneous 

performance. Thus, W (t) is the weighted sum of completion 

c 


probabilities given by 


W (t) 
c 


i e A (t) 


(2.44) 


where K is the system capacity (■ 5). 


9- r 9 
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(v) Total expected, tasks completed , N c can be computed by 

assuming that all tasks i £ A(t) with P ci (t) > &(~ .99) 
are successfully acted upon. Thus, 


N 

c 


I'jr 

0 'i e A(t) 


6[P cl (t)-0] >dt 


(2.45) 


where <$[P c ^(t)-8] is the Dirac delta function and T is the 
duration of the experiment. 

(vi) Average time spent on a task on line i, T gi is the time the 
human attends to task i on the average. It is given by 

C fi 

V ‘ 1 p di (t) dt (2 - 46) 

C 0i 

where t^ and t^ are the times between which a task Is on 
line i. 

In the next chapter, model predictions of the above response mea¬ 
sures are compared with the experimental results for the conditions A, B, 
C, D and B . 

y 

2.8 Summary 

In this chapter, an analytic model of human task sequencing perfor¬ 
mance was developed. The modeling approach borrowed from the successful 
optimal control modeling methodology. The approach taken here and in 
[10] is quite general, flexible and covers all cases where the present 
decisions affect future information and future rewards. As with the OCM, 
the dynamic decision model (DDM) developed in this chapter consists of 
two separable blocks: information-processor and decision-maker. The 
information-processor compensates for the human's observation noise, 

I 
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time-delay and monitoring allocations to produce the best linear unbiased 
estimates of the "decision state". The conditional Gaussian statistics 
of the decision state constitute a sufficient statistic of the decision 
process. The statistics, along with the task values, are used in a 
myopic decision policy, based on semi-Markov decision process theory, to 
determine the attractiveness measure of each of the decision alternatives. 
The measures are subsequently used in a stochastic choice model, that 
explicitly considers human's inability to discriminate precisely, to 
generate the decision probabilities. 

Some novel features of our modeling approach are in the use of the 
concept of a decision state; the explicit incorporation of human limita¬ 
tions at the information-processing and decision-making stages; and its 
suitability to assimilate new elements of the task as they become con¬ 
sidered and understood. The last item corresponds to such issues as 
precedence restrictions, resource constraints, general reward structures, 
non-stationary task characteristics, and even different experimental 
paradigms that involve the basic ingredients of monitoring, information¬ 
processing and dynamic decision-making. Moreover, the model may be used 
in a covariance propagation mode or in a sample path mode. The first 
mode is appropriate for model-data validation efforts presented in 
chapter III. The second mode is suitable for decision-aiding as discussed 
in chapter IV. 


j 
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III. MODEL-DATA VALIDATION STUDIES 


In chapter II, the dynamic decision model (DDM) of human task 
sequencing performance was developed, and the model's ability to generate 
various response measures of interest, viz., P^(t) ,P c ^(t) ,P g (t), etc., 
was noted. The present chapter proposes several metrics for assessing 
the "goodness of fit" (or "similarity") between the model predictions and 
the experimental data, and presents results on the model-data validation 
efforts. 

3.1 Data analysis 

As mentioned in section 1.5, the data sampled during each run con¬ 
sisted of the subject's decisions, d^t); the task completion status, 
c^(t); and the error sequence, e^t). These raw data were ensemble 

averaged to obtain empirical estimates of the following response variables: 

H 

(i) The decision probability , P^Ct), °f acting on a task of line 
i at time t, 


Vj 


p“ (O - J -1 * - 1 - 

dl N 

s 


(3.1) 


E V 

j-1 


where 


N g = total number of subjects 

N * total number of runs of subject j 
Rj 


“ * -A. ■ 
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(ii) 


(iii) 


and 

1 if subject j was processing a task on line i 
at time t during run k 

0 otherwise 

11 

The completion probability, p ci (t) of having completed a task 
on line i by time t, 



- 


N N_ . 
s Rj 


£ £<?“> 

i=l k=l _ 

N 


(3.2) 


£» 

j=i 


Rj 


where 

! 1 if subject j has completed task i by time t 

during run k 

0 otherwise 
H 

Clearly, P^(t) is a monotonically increasing function of 
time. It Is reset to zero at the end of the opportunity 
window of the present task on line i, i.e., before the arrival 
of the next task in the sequence. 

The error probability, P g (t), of engaging a task which can not 
possibly be completed, was calculated from the data via 


pV) 

e 


5 N N_, 
s Rj 

£ £ £ •? 

- 1-1 -1*1 k-l 


(t) 


(3.3) 


n 

£» 

J-i 


RJ 
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whe re 


e^(t) 


1 if subject j was acting on a task of line i 

at time t during run k that can not be suc¬ 
cessfully completed 

0 otherwise 


(iv) The average accumulated, reward, R n (t) earned through time t is 
related to P^(t) via 3X1 expression similar to Eq (2.43). 

(v) The normalized incremental reward , W^t) earned by the human is 

c 

given by an equation similar to Eq (2.44). 

_u 

(vi) The nurrber of ejected tasks completed , N c (t), was computed 


N H (t) 

c 


\ »«. 5 8 Ti „ 

nr-l k=l i“l J*1 


(3.4) 


Hill 

where c^ is as defined in Eq (3.2), N,^ is the total number 
of tasks that appear on line i, and is the time at which 

a task j of the sequence (i.e., j-th pass) on line i reaches 
the end of its opportunity window. 

(vii) The average time spent on a task on line i that arrived at 
time tg^ and (would have) departed at time t^ during the 
j-th pass is given by a relation similar to Eq (2.46). That 


ffij 

f?.(t) - I P?, 


(t) dt ; 1*1,2,...,5 ; 


., 2, • • •, N_ 


• r»Vsj 
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where is as defined in Eq (3.4). 

3.2 Measures of Similarity 

In order to assess the closeness of model vs. data results and to 
perform sensitivity studies on the model, it is necessary to define 
"closeness". In this section, we propose several time-history and scalar 
measures of similarity, which are subsequently used as a means to validate 
the model. 

3.2.1 Time-history Metrics 

These measures compare the ensemble-averaged time-history of a 
response variable obtained empirically with that predicted by the DDM. 
Here, we formulate five time-history metrics that appear to be suitable 
in the present multi-task decision paradigm. 

(i) The decision probability comparisons P (.t) versus 

.5. 

|| 

(ii) The completion probability comparisons p ci (t) versus 

. 5 . 

(ill) The normalized incremental reward comparisons , W^(t) versus 

<(t). Equivalently, the difference (W^(t) - W^(t)), or the 

rms ditference W (t) given by 

, 5 " 2 1/2 

W cr (t > ’ i£[ r i (t) - P >>)] (3 ' 6) 

' i-1 ) 

may be used as a measure of similarity. 

(iv) The accumulated reward comparisons, R n (t) versus R (t). 

H M 

(v) The error probability comparisons , P”(t) versus P g (t). 

3.2.2 Scalar Metrics 


Below, we propose 'lx scalar metrics that appear to be pertinent 
in the multi-task paradigm. The suggested scalar measures are useful in 
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1 

I 

the model-data validation studies, as well as in understanding the impact 

of changes in various model parameters on the DDM predictions. 

(i) Action metric , AM computes the normalized time integral of the 

squared error differences between the decision probabilities 

Pj.Ct) and P*J,(t).’ That is, 
ax ai 



( 3 . 7 ) 


where T is the duration of the experiment. The square root 
of AM is a measure of the average discrepancy between P^^t) 
and P^t). 

(ii) Incremental Reward Metric , IRM is the normalized time integral 
of the squared, weighted difference of the completion proba¬ 
bilities P H .(t) and P M .(t) given by 
Cl cl 


IRM * 


L/[ r i (t) ' "el^)] 2 dt 

i-lT) _ 

5 

L / r i (t) dt 

i-i -6 


( 3 . 8 ) 


The square root of IRM is a measure of the difference between 
the average reward-earning rates of the human and the model, 
(iii) Accumulated Reward Metric , ARM is the normalized time integral 
of the squared difference between the average reward earned 
upto that instant of time by the human and the model. There¬ 
fore, 


, T 


R H (t) - 


R M (t) 


dt 


av£ 


ARM 


( 3 . 9 ) 
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(iv) 


(v) 


(vi) 


where R ^ is the maximum available reward during the run. 
The square root of ARM is a measure of discrepancy between 
the average overall performance of the model and the human. 
Task Completion Metric, TCM computes the normalized squared 
differences between the average number of taaks completed by 
the human and the model as 


TCM - 


c_c 


(3.10) 


N 


a vt 


where Is the total number of available tasks during the 
experimental run. 

Average time on each taek metrio, ATTM calculates the normal¬ 
ized root-mean-squared sum of the difference between the times 
spent on each task by the human and the model according to 


ATTM 




av£ 


where is defined in Eq (3.4) and ( t Qij 


(3.11) 

) is the initial 


(actual) processing time of a task on line 1 during the j-th 
pass. 

Error probability metrio, EPM is the normalized time integral 
of the squared differences between the error probabilities 
pj|(t) and Pg(t) and is given by 


EPM 


i J [p»(t> - p«<o] 


dt 


(3.12) 


Note that the normalized scalar measures can range from a 
value of 0, corresponding to a perfect fit between the model 
and data, to a maximum value of 1. 
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3.3 Model vs. Data Comparisons 

The application of the DDM to generate predictions of various 
response measures Is straightforward, once we specify the parameter set 
ft * {t, p^, c, T rq }. From experience with the DOM, we choose 
x = human's time-delay =0.2 sec 

= observation noise-to-signal ratio = 0.01 (i.e., -20db) 
After a sensitivity study was made on the DDM, we selected the remaining 
parameters as 


c = co-efficient of variation = 0.3 (see Eq. (2.40)) 

T rq ■ "fictitious" processing time = 3 sec (see Eq. (2.35)) 

The parameter set was held constant across experimental conditions. In 
all cases, the subjective values q^ are chosen to be the objective 
rewards r^. Pertinent data on task attributes, viz., arrival times, 
processing times, values and velocities, for the experimental conditions 
A, B, C,D and may be found in Ref. [10]. 

The five time-history metrics generated from the data and the model 
are compared in Figs. (11-35) for the five experimental conditions A, B, 

C, D and B y . The ensemble data were obtained by averaging over NR rims 
(e.g., NR = 48 for condition A). The results show striking similarity 
between the data and model predictions. The model-data match is uniformly 
good to excellent for all the five experimental conditions studied. This 
is most noteworthy considering that a nominal set of parameters were used 
throughout, and that the decision problem involved is complex. To be 
sure, there are some discrepancies, as In decision probability (P dl ) 
comparisons: they show that the model predictions exhibit rapid varia¬ 
tions when compared to the data. This discrepancy is likely a result of 
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human inertias, e.g. neuro-muscular lags, decision time losses, etc. It 

can be corrected by employing subjective values that depend on previous 

actions, or by incorporating a switching cost in the attractiveness 

measure of Eq (2.37). Since the discrepancies were not major in terms of 

-M -H 

the overall performance comparisons R (t) vs. R (t), and since our focus 
was on developing the structure of human decision model rather than the 
fine-tuning of it, these modifications were not explored in detail. 

The average times spent on each task by the model and the human, 
along with the six scalar measures of similarity for experimental condi¬ 
tions A, B, C, D and are displayed in Tables 1 through S. They also 
indicate a reasonably close agreement between the model and data results. 


N. Pass 

Line 

1 

2 

3 

— 

4 

-- ' 

5 

6 

1 

2.659 

(2.619) 

1.603 

(1.464) 

2.767 

(2.607) 

0.361 

(0.321) 

3.233 

(4.155) 

1 

g|g|j||| 

2 

5.510 
(5.333) 

4.132 
(4.167) 

3.619 

(3.976) 

3.763 

(3.500) 

1.922 

(1.690) 

jjgggy 

3 

1.763 

(1.583) 

1.603 

(1.631) 

1.909 

(3.250) 

1.319 

(2.643) 

3.614 

(3.833) 

4.693 

(4.548) 

4 

2.022 

(3.417) 

3.588 

(4.071) 

1.921 

(1.607) 

4.532 

(4.167) 

2.549 

(2.726) 

mg* 

5 

1. 763 
(1.536) 

3.812 

(3.607) 

3.531 

(3.500) 

4.645 

(4.440) 

2.769 

(2.583) 

1.582 

(1.631) 


TAB LE la : AVERAGE TIME SPENT ON EACH TASK IN EACH PASS FOR 
CONDITION A (Brackets: Data) 


SCALAR MEASURE 

VALUE 

AM 

0.05569 

IRM 

0.05421 

ARM 

0.00018 

TCM 

0.0b019 

ATTM 

0.0431 

EPM 

0.05 867 


TABLE lb : SCALAR MEASURES OF SIMILARITY FOR CONDITION A 
(A Value of 0 Corresponds to a Perfect Fit) 
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''V*. Pass 

Line 

1 

2 

3 

4 

5 

6 

7 

1 

2.792 

(3.010) 

2.121 

(3.615) 

2.494 

(2.625) 

4.491 

(4.281) 

- 

1.828 

(3.042) 

4.346 

(4.062) 

1.020 

(0.062) 

2 

5.582 

(4.771) 

1.514 
(1.82 3) 

3.547 

(3.708) 

4.213 

(3.781) 

2.428 

(2.375) 

4.467 

(4.115) 

( - ) 

3 

1.762 

(1.615) 

1.659 

(1.573) 

3.379 

(3.521) 

5.557 

(4.385) 

1.267 

(1.656) 

2.176 

(2.458) 

2.539 
(0 . 802) 

4 

1.585 

(3.385) 

3.473 

(3.104) 

1.427 

(1.427) 

3.357 

(3.573) 

3.379 
(3. 32 3) 

WWTTJ 

( - ) 

5 

1. 364 
(1.740) 

3.542 

(2.906) 

1.376 

(1.677) 

2.409 

(2.573) 

2.376 

(2.510) 

1.376 

(0.917) 

( - ) 


TABLE 2a : AVERAGE TIME SPENT ON EACH TASK IN EACH PASS FOR CONDITION B 
(Brackets: Data) 


SCALAR MEASURE 

VALUE 

AM 

0.06656 

IRM 

0.09250 

ARM 

0.00019 

TCM 

0.028xl0 6 

ATTM 

0.06706 

EPM 

0.00933 


TABLE 2b: SCALAR MEASURES OF SIMILARITY FOR CONDITION B 


































\. Pass 

Liae^s. 

1 

2 

3 

— 

4 

5 

6 

7 

8 

9 

1 

3.744 

(4.095) 

0.202 

(0.905) 

1.239 

(2.838) 

3.251 

(0.811) 

0.019 

(0.405) 

3.665 

(3.270) 

3.613 

(3.608) 

0.040 

(0.514) 

1.306 

(2.932) 

2 

0.810 

(2.622) 

3.651 

(2.905) 

3.245 

(1.770) 

3.541 

(3.486) 

0. 300 
(0.851) 

3. 387 
(3.419) 

3.766 

(2.892) 

1.235 

(0.486) 


( - ) 

J 

3.520 

(3.392 

0.166 

(0.716) 

3.738 

(3.608) 

3.288 

(3.527) 

3.672 

(3.541) 

0.491 

(1.338) 

0.030 

(0.230) 

2.640 

(1.149) 


( - ) 

4 

0.023 

(0.284) 

0.173 

(0.892) 

0.077 

(0.027) 

3.402 

(3.676) 

0.152 

(0.703) 

3.292 

(1.716) 

0.244 

(0.027 

mmm 


BBS 

( - ) 

5 

3.566 

(2.865) 

- 

3.755 

(3.622) 

3.699 

(3.635) 

_ 

2.288 

(3.514) 

3.513 

(3.635) 

3.377 

(3.541) 

3.094 

(3.459) 



( — ) 

( - ) 


C ABLE 3a : AVERAGE TIME SPENT ON EACH TASK IN EACH PASS FOR CONDITION C 
(Brackets: Data) 


SCALAR MEASURE 

VALUE 

AM 

0.06676 

IRM 

0.09269 

ARM 

0.00045 

TCM 

0.00370 

ATTM 

0.07856 

EPM 

0.00200 


TABLE 3b: SCALAR MEASURES OF SIMILARITY FOR CONDITION C 
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2.792 

(3.031) 

1.624 

(1.172) 

3.613 

(2.734) 

1.438 

(1.219) 

0.012 

(0.016) 

2.436 

(2.531) 

4.636 

(4.344) 

0.915 

(2.547) 

1.725 

(1.672) 

3.428 

(3.313) 

1.239 

(1.422) 

2.398 

(2.344) 

2.729 

(2.328) 

0.010 

(3.359) 

1.761 

(1.641) 

1.044 

(2.516) 

2.688 

(2.672) 

2.677 

(4.391) 

4.227 

(4.594) 

0.019 

(0.531) 

0.076 

(0.859) 

0.962 

(2.656) 

0.374 

(0.953) 

0.026 

(0.063) 

3.502 

(3.563) 

2.296 

(2.188) 

1.587 

(1.500) 

2.268 

(0.531) 

1.494 

(J.656) 

3.713 

(3.406) 

1.474 

(1.688) 

2.783 

(2.688) 

3.407 

(3.516) 

5.351 

(2.750) 

0.392 

(3.891) 


8 

9 

2.494 

(1.156) 

1.119 

(1.531) 

2.123 

(0.875) 

( - ) 

3.029 

(1.109) 

| 

BSD 

( - ) 

( — ) 

( - ) 

( — ) 
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Pass 

Line^ v '^^ 

l 

2 

3 

4 

5 

6 

7 

1 

2.791 

(2.906) 

3.368 

(2.750) 

2.576 

(2.859) 

4.437 

(2.875) 

2.059 

(1.000) 

4.326 

(4.328) 

0.825 

(0.031) 

2 

5.543 

(2.250) 

1.495 

(1.859) 

3.513 

(3.859) 

4.428 

(4.656) 

2.292 

(2.656) 

4.220 

(3.766) 

( -) 

3 

1.371 

(1.797) 

1.205 

(1.672) 

3.439 

(3.313) 

5.272 

(3.656) 

1.398 

(1.828) 

2.380 

(2.859) 

2.074 

(0.922) 

4 

1.889 
0.561) 

3.273 

(3.016) 

0.802 

(0.016) 

3.344 

(3.719) 

3.328 

(3.828) 

1.919 

(0.125) 

(-) 

5 

1.546 
(1. 766) 

3.275 

(3.281 

1.541 

(1.828) 

2.377 

(2.703) 

2.352 

(2.688) 

1.227 

(1.266) 

(-) 
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I 

TABLE 5a : AVERAGE TIME SPENT ON EACH TASK IN EACH PASS FOR CONDITION B 
. (Brackets: Data) ^ 


SCALAR MEASURE 

VALUE 

AM 

0.06105 

IRM 

0.06944 

ARM 

0.00066 

TCM 

0.00094 

ATTM 

0.07897 

EPM 

— 

0.00132 



TABLE 5b: SCALAR MEASURES OF SIMILARITY FOR CONDITION B 
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3.4 Sensitivity Analysts of the PPM 

Sensitivity studies were made on the DDM with respect to the para¬ 
meter set Q. The study showed that the model predictions exhibit greater 
sensitivity to the parameter c, the co-efficient of variation, than to 
the remaining parameters T, p^, T RQ . Therefore, only the results of 
varying the parameter c are presented in detail for experimental condi¬ 
tions B and D, and results for the other parameters and the discount 
factor, a are briefly summarized. 

(i) Variations of ao-e ffioient of variation , c: The parameter 
c was varied in the range 0.1 - 1.0 and the model predic¬ 
tions of percent reward earned, percent tasks completed and 
the scalar measures of similarity are plotted in Figs. 36 
and 37 for the experimental conditions D and B, respectively. 
As the value of c increases, the percent reward earned and 
the percent tasks completed by the model decreases. This 
is because the model allocates attention equally among tasks 
fit high values of c. This results in a reduction in the 
number of tasks being completed and, hence, the reward, since 
the value is credited only at the end of a successful task 
completion. The tendency of the model to uniformly allocate 
attention among tasks at large values of c, causes a decrease 
in the measure AM. Ho - 'ever, all the other measures of 
similarity, IRM, ARM, ATTM and EPM, generally increase with 
increasing c. Note, in particular, that ARM, which is a 


^The measure TCM is not shown, as it is similar to comparing percent 
tasks completed by the human and the model. 
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36: SCALAR MEASURES VERSUS CO-EFFICIENT OF VARIATION FOR EXPERIMENTAL CONDITION D 
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measure of overall performance of the model, exhibits good 
sensitivity to c when compared to IRM, which is a measure of 
incremental performance. Overall, the results indicate that 
a value of c in the range 0.3 + 0.1 gives a good fit to the 
experimental data. 

Variations of time-de lay , T: As time-delay increases, the 
uncertainty associated with the estimation of the decision 
state increases. This, in turn, leads to a smaller number 
of tasks being completed, and smaller reward being earned. 
The measures AM and IRM were found to be relatively insensi¬ 
tive (within 10 percent) to time-delay variations in the 
range 0.15 - 0.50 sec, wheras ARM was quite sensitive to T. 
Also, the measures ATTM and EPM exhibited modest increases 
with time-delay. The overall results indicated that a value 
of i in the range 0.20 + 0.05 sec is the best choice, a 
range consistent with that employed in the OCM. 

Variations of discount factor, u: As « increases, the model 
allocates attention to tasks with small processing times. 
This results in a decrease of total reward earned, although 
the number of tasks (of less value) completed may increase. 
The measures AM, IRM and ATTM generally increase with a, 
whereas the overall measure ARM is insensitive (within 10 
percent) to variations in the discount factor. Overall, a 
value of a z 0 was found to give the best possible match to 
the data. Therefore, the parameter u was discarded from 


the model. 
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(iv) Variations of observation noise ratio, p^: The model 

response was relatively insensitive (within 10%) to observa¬ 
tion noise ratio in the range -15 db to -25 db. However, 
the results showed some perplexing trends. At high values 
of p (i.e., less negative), the model earned more reward 
and completed more number of tasks than at low values of p^. 
Therefore, the measures 1RM and ARM decrease with increases 
In p., but the measure AM appears to increase slightly. This 
apparent anomaly may he due to complex interaction between 
and the co-efficient of variation, c. 

(v) Variations of "fictitious" processing time, T ; As T 

KU KU 

increases, the attractiveness measure, M () (t) becomes more 

negative. This reduces the "do nothing" probability, P^Ct) 

and results in a non-decreasing total reward. A value of 

T ~ 3 to 5 sec was found to be a reasonable choice in the 
KO 

present experimental context. 

The above sensitivity results show that the choice of the parameter 
set £2 is not critical, at least within a reasonable range of variation. 
However, future research could determine whether or not the parameter set 
remains constant with modified decision paradigms, such as those suggested 
in section 4.1. 

3.5 Comparison with other Decision Models 

Since the decision situation basically involves dynamic sequencing 
of tasks under uncertainty, a logical question is: "Couldn't we have 
used one of the many sequencing rules that appear in the scheduling 
literature 128] to model human decision strategy as effectively as the 
DDM?" In tills section, we answer this question in the negative by 
comparing the DDM with four heuristic sequencing rules of scheduling 


T + 


<?+ * 


vr**-* y 
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theory. We also contrast DDM with two other decision rules, which may 
be interpreted as special cases. The results illustrated here are for 
condition D only, but they are representative of the other conditions as 
well. 

3.5.1 Comparison with Heuristic Sequencing Rules 

The following four decision rules were selected for comparison 
with the DDM: 

(i) Weighted, shortest remaining processing time (WSRPT) rule-. 
At any time t, this rule chooses a task with maximum 

[r.(t)IT (t) ]. Some advantages of WSRPT rule are: (a) 
it minimizes the weighted completion times as well as the 
weighted waiting times of tasks being sequenced, and (b) 
it does not require any look-ahead features, even though 
tasks become available intermittently, i.e., it is a 
dispatching decision rule. The major drawbacks of this 
rule are: (a) it stipulates a (1,0) type of decisions 
and does not consider randomness in human response; (b) 
it does not take into account the time available to work 
on a task, although it does minimize average lateness of 
tasks (if allowed to work even after deadline has ex¬ 
ceeded) ; (c) It assumes that T (t) is deterministic; and 

Kl 

(d) it discriminates among tasks to the greatest possible 
extent, resulting in increasingly excessive waiting time 
for low priority tasks. The first two cited limitations 
of WSRPT are removed in the decision rules (ii) and (iii), 
respectively. 

(ii) WSRPT with stochastic choice'. This rule is similar to 
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(1), except that it employs Luce's choice axiom to render 
the decision rule random as in the DDM. 

(iii) Modified WSRPT : At any time t, this rule selects a task 
with maximum £r ± (t) | T Ri (t) ]*u[T ai (t) - 1^(0], where u(*> 
is a unit step function. This rule is similar to (i), but 
takes into consideration the time available to work on a 
task via a unit step function involving slack time, 

(iv) Weighted Slack time (WST) vule: At any time t, this rule 

selects a task with maximum [r (t)I(T .(t) - T (t))]. This 

x di KX 

scheme is often used with WSRPT sequencing to overcome the 
limitation (d) of WSRPT rule. 

Table 6 compares DDM performance with those of the heuristic se¬ 
quencing rules (i) - (iv) via the scalar measures of similarity for the 
experimental condition D. The figures in brackets display the percent 
decrement in performance of a heuristic sequencing rule, using measures 
for DDM as a base. The results clearly indicate that the performance of 
DDM is significantly better than the sequencing rules (i) - (iv). It 
should also be noted that WSRPT rule with stochastic choice does better 
than a pure WSRPT, thereby confirming randomness in human decision be¬ 
havior, as well as the inadequacy of Monte Carlo models of the type 
espoused by Tulga [8], These results also cast doubt on models that 
assume perfect human perception of the task attributes. 

3.5.2 Comparison with Related Decision Models 


Several decision models that are derivatives of the DDM were 
studied; two particularly interesting ones are discussed here. 





















WST: Weighted slack time 
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(i) 


Related model 1: This model assumes that the subjective 


looses, q^(t), are zero. Thus, the attractiveness measures 
of Kqs. (2.35) and (2.37) become, respectively 


M 0 (t) = 0 


(3.13) 


M 1 (t) = r i (t) (t) ; i e A(t) 

(ii) Related model 2: In this model, the attractiveness measures 
are given by 


V, - - £ P | I a J <t)i I Ro! 

jeA(t) 


(3.14) 


V c > - r t (t> P|T R1 (t) < 


(t> ! - Z 

jeA(t) 


This model may be derived from Eqs (2.35) and (2.37) by 

letting all the available times T (t) =» °°, miM while com- 

am 

puting u) , and setting T . - i^j in evaluating r).(t). 
ij aj x 

The form of the attractiveness measures in Eq (3.14) is 
similar to those of "information-integration rules" of be¬ 
havioral decision theory [2], A notable feature of this 
model is that it affords simple computation, and does not 
require any (numerical) approximations in its evaluation. 

The results of Table 7 show that the simplified models perform 
almost as well as the DDM. Model 1 matches the data well with respect 
to measures AM and IRM, but performs poorly with respect to error pro¬ 
bability measure, EPM. In fact, this is what motivated us to include 


the subjective losses in the attractiveness measures. 















rv .«*»> 



Table 7: COMPARISON OF DDM WITH RELATED DECISION RULES 

















106 


3.6 Summary 

This chapter described the results on model-data validation efforts. 

In order to validate the model, several time-history and scalar measures 
of similarity between the model predictions and the experimental data were 
proposed. The model-data validation effort consisted of comparing the time- 
history metrics, such as the decision probabilities, completion proba¬ 
bilities, incremental reward, accumulated reward and error probability. 
Validation on the basis of scalar measures consisted of comparing the 
average time spent on each task, the difference between the incremental 
and accumulated rewards of the model and data, etc. 

When viewed in total, the model-data comparisons for all the cases 
studied are excellent. This is achieved with a simple, intuitively 
appealing decision model, using a nominal set of parameters throughout. 

To be sure, there are some discrepancies, as in decision probability 
comparisons. However, these mismatches are not major, and can be 
corrected by minor model modifications. The model predicted trends 
generally agree with the data. 

Sensitivity analysis of the DDM has shown that the choice of the 
parameter set is not critical, at least within a reasonable range of 
variation. The performance of DDM was contrasted with those of several 
heuristic sequencing rules of scheduling theory, as well as some related 
decision models. The results point to the clear superiority of the DDM in 
representing human task sequencing performance. 





IV. DISCUSSION AND EXTENSIONS 


The primary purpose of this research has been to gain an understand¬ 
ing of human information-processing and task selection procedures in dy¬ 
namic multi-task environments. The approach has been to combine the 
results of a joint analytic and experimental program into a normative 
dynamic decision model (DDM) of human task sequencing performance. To 
this end, a general multi-task paradigm was developed that retains the 
essential features of human task selection in a manageable, yet manipula¬ 
tive, context. Via this framework, we have studied the effects of various 
task related variables on the human decision processes. The model that 
has emerged from this effort could form a small, but significant, step 
towards human modeling in complex supervisory control systems. In the 
following, several suggestions for further research are given. These 
include model refinements, model application to decision-aiding and the 
modeling of multi-human decision-making in multi-task systems. 

4.1 Modifications of the Decision Paradigm 

The concept of a decision state is fundamental to our analytic 
modeling approach. The human's decision strategy depends directly on his 
estimates of the decision state, once the task values and a performance 
metric are given. In the present experimental context, the decision state 
is related to the task state via a simple functional transformation. Also, 
the tasks are assumed to be Independent and the task values are constant 
as the bar moves across the CRT screen. This simplicity in the experimen¬ 
tal paradigm enabled us to develop the DDM by focusing on the underlying 
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structural aspects of the human decision-processes, without the attendant 
task complexities. However, the future tests of DDM should consider more 
intricate task structures, such as those involving non-stationary task 
attributes, task dependency and resource constraints. These and other 
extensions are described below: 

(i) Non-stationary task attributes'. In many realistic situations, 
the task attributes (e.g., value and velocity) may evolve in 
time, or they may vary as a function of human's decisions. 

For example, a AAA gunner who fires at an enemy target may 
find that the target has changed course and is diving towards 
the gunner's position. This results in a change of target's 
value and the time available to engage the target. The pres¬ 
ent experimental paradigm can be modified to include time- 
varying task characteristics in a straightforward manner. 
However, the analytic framework of the model is valid almost 
in toto for this case. 

(ii) Task dependency : Since the subsystems are interconnected 
physically in a complex system, the tasks are, in general, 
correlated. This correlation may assume the form of preced¬ 
ence relations and/or dependency among the attributes of 
different tasks. Precedence relations pertain to the exist¬ 
ence of technological restrictions on the task sequence, or 
the partial ordering among the tasks. The precedence rela¬ 
tions generally take the form of an assembly tree or a 
branching tree. A relevant example of such a situation is 
the problem of multi-RPV control, where some RPVs (e.g., 

ECM) must be brought over the target area before the others. 









This situation can be incorporated into the experimental 
paradigm by not allowing the subject to engage certain tasks 
until he has successfully completed their prerequisite tasks. 
In this case, the analytic modeling of the decision process 
involves a two step procedure in which sequencing phase is 
preceeded by a labeling phase that identifies feasible action 
subsets. Thus, only the set of feasible decisions, P(t), 
along with any human limitations (e.g., loss of decision time 
in the labeling phase), need to be identified. On the other 
hand, the task correlations due to dependency among the 
attributes of different tasks can be incorporated in the form 
of coupled subsystems in a state space formulation. This 
will undoubtedly increase the computational complexity of the 
model. Hopefully, only a small number of tasks have such 
interactions. 

(Hi) Resource constraints : In practice, resources, such as fuel 
and ammunition, are finite. Since the availability of 
resources has implications in the human decision-making 
processes, future research should investigate human decision 
behavior with resource constraints. In the present experi¬ 
mental paradigm, a displayed resource may be the total time a 
subject can expend in processing tasks. This research could 
delineate the nature of differences in human behavior under 
constrained and unconstrained situations. 

(iv) A Related Paradigm : Although the present experimental para¬ 
digm is well-suited to understand the basic issues of a multi 
task decision problem, it is far too abstract to be of use in 













a specific application. A means to overcome this limitation 


and, at the same time, be close to the manual control para¬ 
digms is to design an experimental situation wherein human 
interacts simultaneously with several dissimilar dynamic 
processes. The task characteristics can be manipulated by 
varying the nature and occurrence of disturbances acting on 
the processes. This experimental paradigm is ideally suited 
to study all the issues of a multi-task decision problem, viz. 
task detection, task sequencing and task implementation. The 
conceptual framework of the DDM is still valid. The modeling 
process poses interesting, albeit solvable, control theoretic 
problems. 

4.2 Computer-aided Decision-making 

With rapid advances in technology and higher levels of automation, 
computers are increasingly being used in decision-making situations. If 
the computer is to be accepted by human as a decision-aid, or if decision¬ 
making responsibility is to be allocated between human and computer, then 
there must exist a symbiotic relationship between the two. Computer-made 
decisions and/or information displays should be compatible with human 
processing goals, implying that the computer would require a model of the 
human! Successful interaction between human and computer could reduce 
human work load, increase probability of correct decisions and reduce 
system risk. 

The DDM developed in chapter II is used in a covariance propagation 
mode to predict ensemble or averaged statistics of human response. How¬ 
ever, for decision-aiding applications, one needs a Monte Carlo (or 
sample-path) simulation of the model. The Implementation of the sample- 
path version of the DDM is similar to that of OCM [31]. That is, the 


I 

Fi 









Ill 


model mimics the human actions, complete with random number generators 
that reproduce inherent human randomness. The simulation generates time- 
histories of human decisions in response to any given task arrival pattern. 
Using a Monte Carlo model, one can study the potential application of 
computer-aiding at various levels as outlined below: 

(i) Information-processing mode: In this mode, the computer, 
using a model of the human or its own internal model, displays 
information relevant to decision-making. The displayed infor¬ 
mation can be of various types: an assessment of the present 
and, possibly, future task states ("raw data") or of the 
decision states ("reduced data"); or the detection of new 
tasks while human is attending to a task. Note that, in this 
mode, the computer provides information at the pre-decision 
level. If this type of aiding is to be effective, the infor¬ 
mation must be accessible in real time, and it must reduce 
memory load of the human. 

(ii) Decision-prompting mode: In this mode, the model provides the 
human with guidelines for making a decision so that he can 
concentrate on few vital decision alternatives. The comput¬ 
erized model may be exercised to rank-order the importance of 
various decisions via the attractiveness measures. These 
metrics are used only as prompting information with the DM 
free to select any of the alternatives. If the model is 
truly representative of human decision-processes, there should 
be high correlation between human and computer decisions. 
Moreover, this mode of aiding may be used to investigate the 
human'8 ability to detect decision blunders by the computer, 
and it may answer the important question: Should a machine, in 


i 









order to help or replace us, act like us? 

(Hi) Decision-sharing mode'. In situations where the human poten¬ 
tially encounters more tasks than he can satisfactorily per¬ 
form, allocation of decision-making responsibility between 
human and computer may be the best mode of human-computer 
interaction. In order that the human-computer interaction be 
efficient, the actions of the computer must be transparent to 
the human, and the computer should be able to infer what the 
human is doing. Thus, a model of the human allows for covert 
communication between the human and computer, and reduces the 
need for overt communication. Moreover, a model of the human 
can be used to predict future courses of action by the human 
so that the computer can strive to avoid them. This results 
in a reduction of conflicts, a particularly desirable feature 
under high work load situations. 

4.3 M ultiple Human Decision-makers 

The study of a multi-task system with multiple decision-makers can 
be approached at various levels of complexity. The analytic framework of 
the ODM can be extended, at least conceptually, to a centralized decision¬ 
making system in which tasks arrive at a central supervisor who, in turn, 
routes them to various subordinate decision-makers. The individual 
decision-makers have the responsibility of sequencing tasks in their 
respective queues. The overall decision-process involves finding a global 
routing strategy for the supervisor and local sequencing strategies for 
the individual subordinates, taking into account inherent and interhuman 
randomness. 

A more realistic and challenging problem is the modeling of multiple 
DMs in distributed multi-task systems. Here tasks arrive at each individ- 
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ual DM. An individual DM has to determine whether to keep an arriving 
task for himself or send it to someone else; and which task, if any, he 
should process. Thus, the decision-process requires the specification of 
a local routing strategy and a local sequencing strategy for each DM. The 
decision process is affected by the communication, information-pattern at 
each DM, hierarchical structures, inter-human randomness and variability, 
to name but a few. 

4.4 S ummary 

In this chapter, we have delineated three logical extensions of the 
present research. The first relates to exercising the model in more com¬ 
plex multi-task situations such as those involving non-stationary task 
attributes, task dependency or resource allocation constraints. This 
research serves to refine and validate the DDM. The second extension 
seeks to use the model for studying computer-aided technology. In this 
context, three modes of interaction between the human and computer are 
identified, viz., the information-processing mode, the decision-prompting 
mode and the decision-sharing mode. It was concluded that in all modes 
of operation, computer must have, as a reference, an internal model of the 
human for effective man-computer interaction. Finally, the third exten¬ 
sion relates to developing models suitable for multi-task systems with 
multiple decision-makers. This research poses problems of immense 
analytic difficulty, but, if solved, will be extremely useful in under¬ 
standing the human component of a complex supervisory control system. It 
is hoped that future contributions will be along these lines. 







APPENDIX A 


LUCE’S CHOICE AXIOM 

The observed inconsistency and uncertainty associated with human 
decision behavior have led to two classes of probabilitistic choice 
models. These are the random utility models and the constant utility 
models. The random utility models (called the "discriminable dispersion 
models" by psychologists and "probit analysis" by statisticians) assume 
that the utility, or the value, of each alternative is intrinsically 
variable at the subjective level, and that the alternative with the 
highest momentary value is chosen. Thus, in these models, the uncertain¬ 
ty in choice is attributed to the randomness in utility. The constant 
utility models, on the other hand, assume that the value assigned to each 
of the alternatives is fixed, but that the choice is a probabilistic 
function of these values. Here, the randomness in choice is attributed 
to uncertainty in the decision rule. Although these two types of choice 
representation are very different in psychological terras, they are some¬ 
what compatible in mathematical terms. This is because some forms of 
probabilistic choice models can be interpreted as either random or con¬ 
stant utility models [24,25}.The random utility models have their origins 
in the works of Thurstone on psychophysical scaling [31] and later Block 
and Marschak on probabilistic theories of response [32], whereas the 
constant utility models have largely been influenced by Luce's choice 
axiom [2A— 27]. 

Luce's choice axiom is a probabilistic formulation of Arrow’s [33] 

11.4 


* 


















115 


famed "law of irrelevant alternatives". The axiom, in essence, says that 
our preferences between two alternatives (stimuli) do not change when 
other alternatives are added to, or discarded from, the overall set of 
alternatives. The axiom has been invoked implicitly or explicitly, in 
psychophysical scaling, utility theory, decision theory, stochastic 
learning theory and in many psychometric models. This is because of its 
simplicity and the resulting computational attractiveness. In the fol¬ 
lowing, the axiom is formally stated and its implications for developing 
a stochastic choice model are discussed. 

A. 1 Notation and Preliminaries 

Let T = {x,y,z...} denote a finite set of independent alternatives 
(e.g., x is the minimum value of some random variables associated with a 
process state transition, x is the maximum attractiveness measure, etc.). 
We use A,B,C,... to denote the non-empty subsets of T. We let P A (x) 
represent the probability of choosing an alternative x when only the sub¬ 
set A of alternatives is offered to the DM. The usual probability 
axioms are assumed to hold for all A. Clearly, P^,(x) is the probability 
of selecting x when the entire set T is presented to the DM and 

P t (A) = 2 p t ( x) 

x£A 

For brevity, we let P(x:y) denote P^ ^j(x), the probability that the DM 
selects x when asked to choose between x and y. Also, we assume that 
P(x:x) = 

A.2 Choice Axiom 

The choice axioili, in essence, states that the removal of some alter¬ 
natives does not alter the relative probabilities of choice among the 
remaining alternatives. That is, the presence or absence of an alter- 
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native Is Irrelevant to the relative probabilities of choice between two 
other alternatives, although the absolute values of these probabilities 
will generally be affected. Formally, for all x C ACT 

P A (x) = V x/A) (A.l) 

whenever the conditional probability exists. 

The choice axiom says that the choice from the subset A is independ¬ 
ent of what else may have been available. In other words, even when the 
entire set T is offered to the DM, if we only look at those occasions 
when the choices are made from the subset A, then the probability of 
selecting x from A, P t <x/A), is identical to the probability of selecting 
x from A, P^(x), when only the subset A was presented to the DM in the 
first place. 

By the definition of conditional probability, 

P (x,A) P„(x) 

P t ( x / A ) = >- ( A )“ = p^T 

Eq. (A.l) can be rewritten as 

P T (x) = P t (A) • P A (x) (A.2a) 

or 

P T (x) 

(A - 2b) 

Eq. (A.2) provides an alternate interpretation of the choice axiom. It 
says that the overall probability of choosing an element x from the set 
T, P T (x), may be viewed as a multi-stage process. First, the probabiility 
of choosing A from T, P,j,(A), is estimated, and then the probability of 
choosing x from A, P A 00, is computed. Note that the subset A may be 
subdivided a number of times until a single element x remains. More¬ 
over, the axiom implies that the product P^,(A) • P A 00 is Independent 











of the way In which T is partitioned into subsets! Clearly, intuition 
suggests that the axiom can not be expected to hold in complex inter¬ 
dependent situations. We will discuss the limitations of the axiom 
later. 

Below, we prove some trivial consequences of the choice axiom as a 
prelude to deriving a stochastic choice model. 

Lemma 1 

Suppose that the choice axiom holds for all A, x e A C T. 


(i) If P T (x) + 0, then P A (x) i 0 

(ii) If P T (x) « 0 and P T (A) j 0, then P A (x) = 0 

(iii) If P T (y) = 0 and y 4 x, then P T (x) * P T _|- y j(x) 

Proof 


(i) Since x e A, P T (x) ^ 0 implies P (A) ^ 0. 

P T (x) 

Thus, P.(x) = P (x/A) = - 5 - 7 TT i 0. 

A 1 V A) P (x) 

(ii) Since P T (x) * 0 and P T (A) ^ 0, P A (x) * p~(X) 

(iii) P T (y) = 0 implies P T (T-{y}) = 1 

Using this and the fact that x ^ y, we have 


0 . 


P T (x/T-{y}) = P T (x) 
By the chpice axiom 


P T (x/T-(y}) = P T _| y j(x) = P T (x) 


The result (iii) shows that an alternative that is never chosen may be 
removed from the set without affecting the choice probabilities. The 
fact that this process may be repeated in any order, until all the choice 
probabilities are positive, is guaranteed by (i) and (ii). 

A.3 Stochastic Choice Model 


Here, we prove that if the choice axiom holds, all the choice pro- 
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babilities are determined by the pairwise probabilities. In the fol¬ 
lowing, we assume, without loss of generality, that the choice probabili¬ 
ties are positive. 

Theorem 1 

If for all x e T, P^Cx) 4 0 and if the choice axiom holds for all x 


. such 

that x c 

ACT, 

then 

(i) 

P(x:v> 

P T (x) 

P A 

p(y:x) 

P T (y) 

' P A 

and 




(ii) 

P T (x) - 

b 

E 




y£T- {: 


p . ( ?•*). 
P(x 


1 * 1 ] 

■•y) J 


(A. 3) 


(A. 4) 


Proof 


(i) By the choice axiom 


P T (X) = P (x,y} (x) * P T ({x »y }) 

“ P(x:y) [P T (x) + P T (y)] 

•SO 

P T (x) [l - P(x:y)3 = P(x:y)P T (y) 

Noting that P(y:x) = 1 - P(x:y), we have 

P(x:y) = P T (x) 

P(y:x) P T <y) 

The result can be extended to include any subset ACT that contains the 
alternatives x and y. The condition in Eq (A.3) states that the odds of 
x being chosen over y from any set containing them equals the odds of a 
binary choice of x over y. This, rather important, consequence of choice 
axiom Is variously referred to as the independence from irrelevant alter¬ 
natives in economics and Clarke's constant Ratio Rule in psychology, since 
it was independently proposed by Clarke [34]. 
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(ii) 

1 + 


To prove part (11), consider the term 

V* P(y;x) n P T (X) V P T (y) 

P(x:y) P T (x) Z-/ P_(x) 

y£T-(x} y£T-{x) 


" P (x) 2 V y) 

y£T 

1 _ 

= P T (x) 

The required result immediately follows. Eq (A.4) is similar to Eqs. 
(2.30), (2.38) and (2.39). Other consequences of the choice axiom, such 
as stochastic transitivity and the existence of a ratio scale, may be 
found in (24-27]. 

A.4 Discussion 

Luce's choice axiom provides a powerful means to construct a ration¬ 
al, probabilistic theory of individual choice behavior. The empirical 
evidence [27] suggests that it works very well in some situations, and 
not so well in others. Here, we summarize the advantages and limita¬ 
tions of the axiom, and indicate decision situations where it can pro¬ 
fitably be applied. 

The primary advantages of the choice axiom are that it allows for 
easy computation of choice probabilities via pairwise comparisons, and 
that it provides a simple means to add new alternatives or subtract from 
existing ones. The latter also points to a weakness in the axiom in 
that the independence of irrelevant alternatives is implausible in 
situations where some of the alternatives are similar. This is exem¬ 
plified by the often cited objection of Debreu [35] to the choice 
axiom. Suppose, we are choosing among a pony (x), a bicycle (y) and 
another bicycle (z). That is, T * {x,y,z}. Assume that all pairwise 
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choice probabilities equal Since y and z are duplicates of each 
other, one expects that P T (x) * while P^,(y) *■ P T (z) “ Data 
supports this intuition. However, if choice axiom is assumed to hold, 
then all trinary choice probabilities equal -j. This example shows that 
two alternatives (x and y), which are equivalent in one context (i.e., 
P(x ; y) * -|) are not equivalent in another context (i.e., P T (x)i l P T (y)), 
contrary to independence of irrelevant alternatives. Thus, the applica¬ 
tion of choice axiom should be restricted to situations where the alter¬ 
natives can be assumed to be distinct and independent, such as those in 
the present work. 
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